Szkoła Główna Handlowa w Warszawie - Centralny System Uwierzytelniania
Strona główna

Real-Time Analytics

Informacje ogólne

Kod przedmiotu: 222891-S
Kod Erasmus / ISCED: (brak danych) / (brak danych)
Nazwa przedmiotu: Real-Time Analytics
Jednostka: Szkoła Główna Handlowa w Warszawie
Grupy: Przedmioty obowiązkowe na programie NMMS-ADA
Punkty ECTS i inne: 3.00 (zmienne w czasie) Podstawowe informacje o zasadach przyporządkowania punktów ECTS:
  • roczny wymiar godzinowy nakładu pracy studenta konieczny do osiągnięcia zakładanych efektów uczenia się dla danego etapu studiów wynosi 1500-1800 h, co odpowiada 60 ECTS;
  • tygodniowy wymiar godzinowy nakładu pracy studenta wynosi 45 h;
  • 1 punkt ECTS odpowiada 25-30 godzinom pracy studenta potrzebnej do osiągnięcia zakładanych efektów uczenia się;
  • tygodniowy nakład pracy studenta konieczny do osiągnięcia zakładanych efektów uczenia się pozwala uzyskać 1,5 ECTS;
  • nakład pracy potrzebny do zaliczenia przedmiotu, któremu przypisano 3 ECTS, stanowi 10% semestralnego obciążenia studenta.

zobacz reguły punktacji
Język prowadzenia: angielski
Efekty uczenia się:

Wiedza:

Know the history and philosophy of data processing models

Know the types of structured and unstructured data

Know the possibilities and areas of real-time data processing

Know the theoretical aspects of REST API and pub/sub

Be able to choose the IT structure for a given business problem

Understand the business needs of making decisions in a very short time

Umiejętności:

Distinguish between structured and unstructured data types

Be able to prepare, process and save data generated in real time

Understand the limitations arising from time processing by devices and IT systems

Apply and construct a system for real-time processing

Be able to prepare reporting for real-time processing system

Kompetencje społeczne:

Formulate an analytical problem along with its IT solution

Consolidate the ability of independent supplementing theoretical and practical knowledge in the field of programming, modelling, new information technologies using real-time analysis.

Zajęcia w cyklu "Preferencje - Semestr letni 2024/25" (jeszcze nie rozpoczęty)

Okres: 2025-02-15 - 2025-09-30
Wybrany podział planu:
Przejdź do planu
Typ zajęć:
Zajęcia prowadzącego więcej informacji
Koordynatorzy: (brak danych)
Prowadzący grup: Sebastian Zając
Lista studentów: (nie masz dostępu)
Zaliczenie: Przedmiot - Ocena
Zajęcia prowadzącego - Ocena

Zajęcia w cyklu "Semestr letni 2024/25" (jeszcze nie rozpoczęty)

Okres: 2025-02-15 - 2025-09-30
Wybrany podział planu:
Przejdź do planu
Typ zajęć:
Laboratorium, 10 godzin więcej informacji
Wykład, 4 godzin więcej informacji
Koordynatorzy: (brak danych)
Prowadzący grup: (brak danych)
Lista studentów: (nie masz dostępu)
Zaliczenie: Przedmiot - Ocena
Wykład - Ocena
Skrócony opis:

1. From Flat Files to Data Mash: Data Processing Models in Big Data.

2. ETL and Batch (Offline Learning) and Incremental (Online Learning) Modeling. Map-Reduce.

3. Data Streams, Events, and Time and Time Window Concepts in Real-time Data Processing.

4. Microservices and Communication via REST API.

5. Contemporary Architectures for Stream Data Processing Applications - Lambda, Kappa, Pub/Sub.

6. Processing Structured and Unstructured Data. Programming Environment for Python.

7. Utilizing Python Object-Oriented Elements in the Modeling Process with Scikit-Learn and Keras.

8. Python Object-Oriented Programming Basics. Building Classes for Random Walk, Perceptron, and Adeline Algorithms.

9. Preparing a Microservice with an ML Model for Production Use.

10. Streaming Data Using RDDs with Apache Spark. Introduction to the DataFrame Object.

11. Methods for Creating Data Streams Using the DataFrame Object in Apache Spark. Setting Output and Input.

12. S

Pełny opis:

Making informed decisions based on data and its analysis is fundamental in today's modern business world. Modern techniques such as machine learning, artificial intelligence, and deep neural networks can significantly enhance business understanding and decision-making quality. Moreover, the speed of decision-making is crucial in a dynamic business environment, especially when dealing directly with customers. The goal of these classes is to impart students with practical experience and comprehensive theoretical knowledge in real-time data processing and analysis, as well as to introduce the latest information technology for processing structured data (e.g., from data warehouses) and unstructured data (e.g., images, sound, video streaming) online. The philosophy of real-time extensive data analysis using Python programming will be presented during the classes. Software structures for data processing will be introduced, along with discussions of the issues and challenges encountered when modeling large amounts of data in real time. Theoretical knowledge will be gained through hands-on exercises using tools such as Apache Spark and Apache Kafka. In the lab sessions, students will utilize fully configured development environments prepared for data processing, modeling, and analysis, ensuring that, in addition to analytical skills and techniques, they also become familiar with and understand the latest information technology related to real-time data processing.

Literatura:

Literatura podstawowa:

1. Zając S. "Modelowanie dla biznesu. Analityka w czasie rzeczywistym - narzędzia informatyczne i biznesowe. Oficyna Wydawnicza SGH, Warszawa 2022

2. K. Przanowski K. , Zając S. red. "Modelowanie dla biznesu, metody ML, modele portfela CF, modele rekurencyjne, analizy przeżycia, modele scoringowe, SGH, Warszawa 2020.

3. Frątczak E., red. "Modelowanie dla biznesu, Regresja logistyczna, Regresja Poissona, Survival Data Mining, CRM, Credit Scoring". SGH, Warszawa 2019.

4. S. Raschka, Python. Uczenie maszynowe. Wydanie II

5. Maas G., Garillot F. Stream Processing with Apache Spark, O'Reilly, 2021

6. F. Hueske, V. Kalavri Stream Processing with Apache Flink, O'Reilly, 2021

7. Nandi A. "Spark for Python Developers", 2015

Literatura uzupełniająca:

1. Frątczak E., "Statistics for Management & Economics" SGH, Warszawa, 2015

2. Simon P., "Too Big to IGNORE. The Business Case for Big Data", John Wiley & Sons Inc., 2013

3. Frank J. Ohlhorst. "Big Data Analytics. Turning Big Data into Big Money". John Wiley & Sons. Inc. 2013

4. Russell J. "Zwinna analiza danych Apache Hadoop dla każdego", Helion, 2014

5. Todman C., "Projektowanie hurtowni danych, Wspomaganie zarządzania relacjami z klientami", Helion, 2011

6. P. Bruce, A. Bruce, P. Gedeck, "Statystyka praktyczna w data science. 50 kluczowych zagadnień w językach R i Python". Helion, Wydanie II, 2021

Uwagi:

Kryteria oceniania:

egzamin tradycyjny-pisemny: 0.00%

egzamin testowy: 40.00%

egzamin ustny: 0.00%

kolokwium: 20.00%

referaty/eseje: 40.00%

ocena z ćwiczeń: 0.00%

inne: 0.00%

Zajęcia w cyklu "Semestr zimowy 2024/25" (w trakcie)

Okres: 2024-10-01 - 2025-02-14
Wybrany podział planu:
Przejdź do planu
Typ zajęć:
Laboratorium, 10 godzin więcej informacji
Wykład, 4 godzin więcej informacji
Koordynatorzy: (brak danych)
Prowadzący grup: (brak danych)
Lista studentów: (nie masz dostępu)
Zaliczenie: Przedmiot - Ocena
Wykład - Ocena
Skrócony opis:

1. From Flat Files to Data Mash: Data Processing Models in Big Data.

2. ETL and Batch (Offline Learning) and Incremental (Online Learning) Modeling. Map-Reduce.

3. Data Streams, Events, and Time and Time Window Concepts in Real-time Data Processing.

4. Microservices and Communication via REST API.

5. Contemporary Architectures for Stream Data Processing Applications - Lambda, Kappa, Pub/Sub.

6. Processing Structured and Unstructured Data. Programming Environment for Python.

7. Utilizing Python Object-Oriented Elements in the Modeling Process with Scikit-Learn and Keras.

8. Python Object-Oriented Programming Basics. Building Classes for Random Walk, Perceptron, and Adeline Algorithms.

9. Preparing a Microservice with an ML Model for Production Use.

10. Streaming Data Using RDDs with Apache Spark. Introduction to the DataFrame Object.

11. Methods for Creating Data Streams Using the DataFrame Object in Apache Spark. Setting Output and Input.

12. S

Pełny opis:

Making informed decisions based on data and its analysis is fundamental in today's modern business world. Modern techniques such as machine learning, artificial intelligence, and deep neural networks can significantly enhance business understanding and decision-making quality. Moreover, the speed of decision-making is crucial in a dynamic business environment, especially when dealing directly with customers. The goal of these classes is to impart students with practical experience and comprehensive theoretical knowledge in real-time data processing and analysis, as well as to introduce the latest information technology for processing structured data (e.g., from data warehouses) and unstructured data (e.g., images, sound, video streaming) online. The philosophy of real-time extensive data analysis using Python programming will be presented during the classes. Software structures for data processing will be introduced, along with discussions of the issues and challenges encountered when modeling large amounts of data in real time. Theoretical knowledge will be gained through hands-on exercises using tools such as Apache Spark and Apache Kafka. In the lab sessions, students will utilize fully configured development environments prepared for data processing, modeling, and analysis, ensuring that, in addition to analytical skills and techniques, they also become familiar with and understand the latest information technology related to real-time data processing.

Literatura:

Literatura podstawowa:

1. Zając S. "Modelowanie dla biznesu. Analityka w czasie rzeczywistym - narzędzia informatyczne i biznesowe. Oficyna Wydawnicza SGH, Warszawa 2022

2. K. Przanowski K. , Zając S. red. "Modelowanie dla biznesu, metody ML, modele portfela CF, modele rekurencyjne, analizy przeżycia, modele scoringowe, SGH, Warszawa 2020.

3. Frątczak E., red. "Modelowanie dla biznesu, Regresja logistyczna, Regresja Poissona, Survival Data Mining, CRM, Credit Scoring". SGH, Warszawa 2019.

4. S. Raschka, Python. Uczenie maszynowe. Wydanie II

5. Maas G., Garillot F. Stream Processing with Apache Spark, O'Reilly, 2021

6. F. Hueske, V. Kalavri Stream Processing with Apache Flink, O'Reilly, 2021

7. Nandi A. "Spark for Python Developers", 2015

Literatura uzupełniająca:

1. Frątczak E., "Statistics for Management & Economics" SGH, Warszawa, 2015

2. Simon P., "Too Big to IGNORE. The Business Case for Big Data", John Wiley & Sons Inc., 2013

3. Frank J. Ohlhorst. "Big Data Analytics. Turning Big Data into Big Money". John Wiley & Sons. Inc. 2013

4. Russell J. "Zwinna analiza danych Apache Hadoop dla każdego", Helion, 2014

5. Todman C., "Projektowanie hurtowni danych, Wspomaganie zarządzania relacjami z klientami", Helion, 2011

6. P. Bruce, A. Bruce, P. Gedeck, "Statystyka praktyczna w data science. 50 kluczowych zagadnień w językach R i Python". Helion, Wydanie II, 2021

Uwagi:

Kryteria oceniania:

egzamin tradycyjny-pisemny: 0.00%

egzamin testowy: 40.00%

egzamin ustny: 0.00%

kolokwium: 20.00%

referaty/eseje: 40.00%

ocena z ćwiczeń: 0.00%

inne: 0.00%

Zajęcia w cyklu "Semestr letni 2023/24" (zakończony)

Okres: 2024-02-24 - 2024-09-30
Wybrany podział planu:
Przejdź do planu
Typ zajęć:
Laboratorium, 10 godzin więcej informacji
Wykład, 4 godzin więcej informacji
Koordynatorzy: (brak danych)
Prowadzący grup: (brak danych)
Lista studentów: (nie masz dostępu)
Zaliczenie: Przedmiot - Ocena
Wykład - Ocena
Skrócony opis:

1. Modelling, learning and prediction in batch mode (offline learning) and incremental (online learning) modes. Problems of incremental machine learning.

2. Data processing models in Big Data. From flat files to Data Lake. Real-time data myth and facts

3. NRT systems (near real-time systems), data acquisition, streaming and analytics.

4. Algorithms for estimating model parameters in incremental mode. Stochastic Gradient Descent

5. Modern streaming application architectures

6. Preparation of the microservice with the ML model for prediction use.

7. Processing structured and unstructured data in Python. Function and Object-oriented connection to RDB and NoSQL

8. Aggregations and reporting in NoSQL databases (MongoDB)

9. Basics of object-oriented programming in Python in linear and logistic regression, neural network analysis using the sklearn, TensorFlow and Keras.

10. IT streaming architecture. Apache Spark and Jupyter notebook environment using docker tool. Analysis of data

Pełny opis:

Making the right decisions based on data and their analysis in business is a process and daily. Modern methods of modelling by machine learning (ML), artificial intelligence (AI), or deep learning not only allow better understanding of business, but also support making key decisions for it. The development of technology and increasingly new business concepts of working directly with the client require not only correct but also fast decisions. The classes offered are designed to provide students with experience and comprehensive theoretical knowledge in the field of real-time data processing and analysis, and to present the latest technologies (free and commercial) for the processing of structured data (originating e.g. from data warehouses) and unstructured (e.g. images, sound, video streaming) in on-line mode. The course will present the so called lambda and kappa structures for data processing into data lake along with a discussion of the problems and difficulties encountered in implementing real-time modelling for large amounts of data. Theoretical knowledge will be gained (apart from the lecture part) through the implementation of test cases in tools such as Apache Spark, Nifi, Microsoft Azure and SAS. During laboratory classes student will benefit from fully understand the latest information technologies related to real-time data processing.

Literatura:

Literatura podstawowa:

1. Frątczak E., red. "Modelowanie dla biznesu, Regresja logistyczna, Regresja Poissona, Survival Data Mining, CRM, Credit Scoring". SGH, Warszawa 2019.

2. Frątczak E., red., "Zaawansowane metody analiz statystycznych", Oficyna Wydawnicza SGH, Warszawa 2012.

3. Rubach P., Zając S., Jastrzebski B., Sulkowska J.I. , Sulkowski P., "Genus for biomolecules", Web Server, Nucleic Acids Research, 2019.

4. Zając S., Piotrowski E. W., Sładkowski J., Syska J., "The method of the likelihood and the Fisher information in the construction of physical models" Phys. Status Solidi B vol. 246 No 5 (2009)

5. Indest A., Wild Knowledge. Outthik the Revolution. LID publishing.com 2017.

6. Real Time Analytic. "The Key to Unlocking Customer Insights & Driving the Customer Experience". Harvard Business Review Analytics Series, Harvard Business School Publishing, 2018.

7. Svolba G., "Applying Data Science. Business Case Studies Using SAS". SAS Institute Inc., Cary NC, USA, 2017.

8. Ellis B. "Real-Time Analytics Techniques to Analyze and Visualize Streaming data." , Wiley, 2014

9. Familiar B., Barnes J. "Business in Real-Time Using Azure IoT and Cortana Intelligence Suite" Apress, 2017

Literatura uzupełniająca:

1. Frątczak E., "Statistics for Management & Economics" SGH, Warszawa, 2015

2. Simon P., "Too Big to IGNORE. The Business Case for Big Data", John Wiley & Sons Inc., 2013

3. Nandi A. "Spark for Python Developers", 2015

4. Frank J. Ohlhorst. "Big Data Analytics. Turning Big Data into Big Money". John Wiley & Sons. Inc. 2013

5. Russell J. "Zwinna analiza danych Apache Hadoop dla każdego", Helion, 2014

6. Todman C., "Projektowanie hurtowni danych, Wspomaganie zarządzania relacjami z klientami", Helion, 2011

Uwagi:

Kryteria oceniania:

egzamin tradycyjny-pisemny: 0.00%

egzamin testowy: 40.00%

egzamin ustny: 0.00%

kolokwium: 20.00%

referaty/eseje: 40.00%

ocena z ćwiczeń: 0.00%

inne: 0.00%

Zajęcia w cyklu "Semestr zimowy 2023/24" (zakończony)

Okres: 2023-10-01 - 2024-02-23
Wybrany podział planu:
Przejdź do planu
Typ zajęć:
Laboratorium, 10 godzin więcej informacji
Wykład, 4 godzin więcej informacji
Koordynatorzy: (brak danych)
Prowadzący grup: (brak danych)
Lista studentów: (nie masz dostępu)
Zaliczenie: Przedmiot - Ocena
Wykład - Ocena
Skrócony opis:

1. Modelling, learning and prediction in batch mode (offline learning) and incremental (online learning) modes. Problems of incremental machine learning.

2. Data processing models in Big Data. From flat files to Data Lake. Real-time data myth and facts

3. NRT systems (near real-time systems), data acquisition, streaming and analytics.

4. Algorithms for estimating model parameters in incremental mode. Stochastic Gradient Descent

5. Modern streaming application architectures

6. Preparation of the microservice with the ML model for prediction use.

7. Processing structured and unstructured data in Python. Function and Object-oriented connection to RDB and NoSQL

8. Aggregations and reporting in NoSQL databases (MongoDB)

9. Basics of object-oriented programming in Python in linear and logistic regression, neural network analysis using the sklearn, TensorFlow and Keras.

10. IT streaming architecture. Apache Spark and Jupyter notebook environment using docker tool. Analysis of data

Pełny opis:

Making the right decisions based on data and their analysis in business is a process and daily. Modern methods of modelling by machine learning (ML), artificial intelligence (AI), or deep learning not only allow better understanding of business, but also support making key decisions for it. The development of technology and increasingly new business concepts of working directly with the client require not only correct but also fast decisions. The classes offered are designed to provide students with experience and comprehensive theoretical knowledge in the field of real-time data processing and analysis, and to present the latest technologies (free and commercial) for the processing of structured data (originating e.g. from data warehouses) and unstructured (e.g. images, sound, video streaming) in on-line mode. The course will present the so called lambda and kappa structures for data processing into data lake along with a discussion of the problems and difficulties encountered in implementing real-time modelling for large amounts of data. Theoretical knowledge will be gained (apart from the lecture part) through the implementation of test cases in tools such as Apache Spark, Nifi, Microsoft Azure and SAS. During laboratory classes student will benefit from fully understand the latest information technologies related to real-time data processing.

Literatura:

Literatura podstawowa:

1. Frątczak E., red. "Modelowanie dla biznesu, Regresja logistyczna, Regresja Poissona, Survival Data Mining, CRM, Credit Scoring". SGH, Warszawa 2019.

2. Frątczak E., red., "Zaawansowane metody analiz statystycznych", Oficyna Wydawnicza SGH, Warszawa 2012.

3. Rubach P., Zając S., Jastrzebski B., Sulkowska J.I. , Sulkowski P., "Genus for biomolecules", Web Server, Nucleic Acids Research, 2019.

4. Zając S., Piotrowski E. W., Sładkowski J., Syska J., "The method of the likelihood and the Fisher information in the construction of physical models" Phys. Status Solidi B vol. 246 No 5 (2009)

5. Indest A., Wild Knowledge. Outthik the Revolution. LID publishing.com 2017.

6. Real Time Analytic. "The Key to Unlocking Customer Insights & Driving the Customer Experience". Harvard Business Review Analytics Series, Harvard Business School Publishing, 2018.

7. Svolba G., "Applying Data Science. Business Case Studies Using SAS". SAS Institute Inc., Cary NC, USA, 2017.

8. Ellis B. "Real-Time Analytics Techniques to Analyze and Visualize Streaming data." , Wiley, 2014

9. Familiar B., Barnes J. "Business in Real-Time Using Azure IoT and Cortana Intelligence Suite" Apress, 2017

Literatura uzupełniająca:

1. Frątczak E., "Statistics for Management & Economics" SGH, Warszawa, 2015

2. Simon P., "Too Big to IGNORE. The Business Case for Big Data", John Wiley & Sons Inc., 2013

3. Nandi A. "Spark for Python Developers", 2015

4. Frank J. Ohlhorst. "Big Data Analytics. Turning Big Data into Big Money". John Wiley & Sons. Inc. 2013

5. Russell J. "Zwinna analiza danych Apache Hadoop dla każdego", Helion, 2014

6. Todman C., "Projektowanie hurtowni danych, Wspomaganie zarządzania relacjami z klientami", Helion, 2011

Uwagi:

Kryteria oceniania:

egzamin tradycyjny-pisemny: 0.00%

egzamin testowy: 40.00%

egzamin ustny: 0.00%

kolokwium: 20.00%

referaty/eseje: 40.00%

ocena z ćwiczeń: 0.00%

inne: 0.00%

Opisy przedmiotów w USOS i USOSweb są chronione prawem autorskim.
Właścicielem praw autorskich jest Szkoła Główna Handlowa w Warszawie.
al. Niepodległości 162
02-554 Warszawa
tel: +48 22 564 60 00 http://www.sgh.waw.pl/
kontakt deklaracja dostępności mapa serwisu USOSweb 7.1.0.0