Real-Time Analytics
Informacje ogólne
| Kod przedmiotu: | 222891-D |
| Kod Erasmus / ISCED: | (brak danych) / (brak danych) |
| Nazwa przedmiotu: | Real-Time Analytics |
| Jednostka: | Szkoła Główna Handlowa w Warszawie |
| Grupy: |
Elective courses for QEM - masters Major courses for AAB - masters Przedmioty kierunkowe do wyboru SMMD-EKO Przedmioty obowiązkowe na programie SMMD-ADA |
| Punkty ECTS i inne: |
3.00 (zmienne w czasie)
|
| Język prowadzenia: | angielski |
| Efekty uczenia się: |
Wiedza: Know the history and philosophy of data processing models Know the types of structured and unstructured data Know the possibilities and areas of real-time data processing Know the theoretical aspects of REST API and pub/sub Be able to choose the IT structure for a given business problem Understand the business needs of making decisions in a very short time Umiejętności: Distinguish between structured and unstructured data types Be able to prepare, process and save data generated in real time Understand the limitations arising from time processing by devices and IT systems Apply and construct a system for real-time processing Be able to prepare reporting for real-time processing system Kompetencje społeczne: Formulate an analytical problem along with its IT solution Consolidate the ability of independent supplementing theoretical and practical knowledge in the field of programming, modelling, new information technologies using real-time analysis. |
Zajęcia w cyklu "Semestr letni 2025/26" (jeszcze nie rozpoczęty)
| Okres: | 2026-02-21 - 2026-09-30 |
Przejdź do planu
PN WYK
WT LAB
ŚR CZ PT |
| Typ zajęć: |
Laboratorium, 20 godzin
Wykład, 10 godzin
|
|
| Koordynatorzy: | (brak danych) | |
| Prowadzący grup: | Sebastian Zając | |
| Lista studentów: | (nie masz dostępu) | |
| Zaliczenie: |
Przedmiot -
Ocena
Wykład - Ocena |
|
| Skrócony opis: |
1. From Flat Files to Data Mash: Data Processing Models in Big Data. 2. ETL and Batch (Offline Learning) and Incremental (Online Learning) Modeling. Map-Reduce. 3. Data Streams, Events, and Time and Time Window Concepts in Real-time Data Processing. 4. Microservices and Communication via REST API. 5. Contemporary Architectures for Stream Data Processing Applications - Lambda, Kappa, Pub/Sub. 6. Processing Structured and Unstructured Data. Programming Environment for Python. 7. Utilizing Python Object-Oriented Elements in the Modeling Process with Scikit-Learn and Keras. 8. Python Object-Oriented Programming Basics. Building Classes for Random Walk, Perceptron, and Adeline Algorithms. 9. Preparing a Microservice with an ML Model for Production Use. 10. Streaming Data Using RDDs with Apache Spark. Introduction to the DataFrame Object. 11. Methods for Creating Data Streams Using the DataFrame Object in Apache Spark. Setting Output and Input. 12. S |
|
| Pełny opis: |
Making informed decisions based on data and its analysis is fundamental in today's modern business world. Modern techniques such as machine learning, artificial intelligence, and deep neural networks can significantly enhance business understanding and decision-making quality. Moreover, the speed of decision-making is crucial in a dynamic business environment, especially when dealing directly with customers. The goal of these classes is to impart students with practical experience and comprehensive theoretical knowledge in real-time data processing and analysis, as well as to introduce the latest information technology for processing structured data (e.g., from data warehouses) and unstructured data (e.g., images, sound, video streaming) online. The philosophy of real-time extensive data analysis using Python programming will be presented during the classes. Software structures for data processing will be introduced, along with discussions of the issues and challenges encountered when modeling large amounts of data in real time. Theoretical knowledge will be gained through hands-on exercises using tools such as Apache Spark and Apache Kafka. In the lab sessions, students will utilize fully configured development environments prepared for data processing, modeling, and analysis, ensuring that, in addition to analytical skills and techniques, they also become familiar with and understand the latest information technology related to real-time data processing. |
|
| Literatura: |
Literatura podstawowa: 1. Zając S. "Modelowanie dla biznesu. Analityka w czasie rzeczywistym - narzędzia informatyczne i biznesowe. Oficyna Wydawnicza SGH, Warszawa 2022 2. K. Przanowski K. , Zając S. red. "Modelowanie dla biznesu, metody ML, modele portfela CF, modele rekurencyjne, analizy przeżycia, modele scoringowe, SGH, Warszawa 2020. 3. Frątczak E., red. "Modelowanie dla biznesu, Regresja logistyczna, Regresja Poissona, Survival Data Mining, CRM, Credit Scoring". SGH, Warszawa 2019. 4. S. Raschka, Python. Uczenie maszynowe. Wydanie II 5. Maas G., Garillot F. Stream Processing with Apache Spark, O'Reilly, 2021 6. F. Hueske, V. Kalavri Stream Processing with Apache Flink, O'Reilly, 2021 7. Nandi A. "Spark for Python Developers", 2015 Literatura uzupełniająca: 1. Frątczak E., "Statistics for Management & Economics" SGH, Warszawa, 2015 2. Simon P., "Too Big to IGNORE. The Business Case for Big Data", John Wiley & Sons Inc., 2013 3. Frank J. Ohlhorst. "Big Data Analytics. Turning Big Data into Big Money". John Wiley & Sons. Inc. 2013 4. Russell J. "Zwinna analiza danych Apache Hadoop dla każdego", Helion, 2014 5. Todman C., "Projektowanie hurtowni danych, Wspomaganie zarządzania relacjami z klientami", Helion, 2011 6. P. Bruce, A. Bruce, P. Gedeck, "Statystyka praktyczna w data science. 50 kluczowych zagadnień w językach R i Python". Helion, Wydanie II, 2021 |
|
| Uwagi: |
Evaluation criteria Traditional Written Exam: 0.00% Multiple Choice Test (MS Teams + Forms): 40.00% Oral Exam: 0.00% Test (Realization on labs): 20.00% Papers/Essays (Preparing a presentation): 40.00% Other: 0.00% The threshold percentage of absences (excluding lectures) defined as the proportion of class hours beyond which the achievement of learning outcomes is deemed unattainable: 50% |
|
Zajęcia w cyklu "Semestr zimowy 2025/26" (w trakcie)
| Okres: | 2025-10-01 - 2026-02-20 |
Przejdź do planu
PN WT ŚR CZ PT |
| Typ zajęć: |
Laboratorium, 20 godzin
Wykład, 10 godzin
|
|
| Koordynatorzy: | (brak danych) | |
| Prowadzący grup: | (brak danych) | |
| Lista studentów: | (nie masz dostępu) | |
| Zaliczenie: |
Przedmiot -
Ocena
Wykład - Ocena |
|
| Skrócony opis: |
1. From Flat Files to Data Mash: Data Processing Models in Big Data. 2. ETL and Batch (Offline Learning) and Incremental (Online Learning) Modeling. Map-Reduce. 3. Data Streams, Events, and Time and Time Window Concepts in Real-time Data Processing. 4. Microservices and Communication via REST API. 5. Contemporary Architectures for Stream Data Processing Applications - Lambda, Kappa, Pub/Sub. 6. Processing Structured and Unstructured Data. Programming Environment for Python. 7. Utilizing Python Object-Oriented Elements in the Modeling Process with Scikit-Learn and Keras. 8. Python Object-Oriented Programming Basics. Building Classes for Random Walk, Perceptron, and Adeline Algorithms. 9. Preparing a Microservice with an ML Model for Production Use. 10. Streaming Data Using RDDs with Apache Spark. Introduction to the DataFrame Object. 11. Methods for Creating Data Streams Using the DataFrame Object in Apache Spark. Setting Output and Input. 12. S |
|
| Pełny opis: |
Making informed decisions based on data and its analysis is fundamental in today's modern business world. Modern techniques such as machine learning, artificial intelligence, and deep neural networks can significantly enhance business understanding and decision-making quality. Moreover, the speed of decision-making is crucial in a dynamic business environment, especially when dealing directly with customers. The goal of these classes is to impart students with practical experience and comprehensive theoretical knowledge in real-time data processing and analysis, as well as to introduce the latest information technology for processing structured data (e.g., from data warehouses) and unstructured data (e.g., images, sound, video streaming) online. The philosophy of real-time extensive data analysis using Python programming will be presented during the classes. Software structures for data processing will be introduced, along with discussions of the issues and challenges encountered when modeling large amounts of data in real time. Theoretical knowledge will be gained through hands-on exercises using tools such as Apache Spark and Apache Kafka. In the lab sessions, students will utilize fully configured development environments prepared for data processing, modeling, and analysis, ensuring that, in addition to analytical skills and techniques, they also become familiar with and understand the latest information technology related to real-time data processing. |
|
| Literatura: |
Literatura podstawowa: 1. Zając S. "Modelowanie dla biznesu. Analityka w czasie rzeczywistym - narzędzia informatyczne i biznesowe. Oficyna Wydawnicza SGH, Warszawa 2022 2. K. Przanowski K. , Zając S. red. "Modelowanie dla biznesu, metody ML, modele portfela CF, modele rekurencyjne, analizy przeżycia, modele scoringowe, SGH, Warszawa 2020. 3. Frątczak E., red. "Modelowanie dla biznesu, Regresja logistyczna, Regresja Poissona, Survival Data Mining, CRM, Credit Scoring". SGH, Warszawa 2019. 4. S. Raschka, Python. Uczenie maszynowe. Wydanie II 5. Maas G., Garillot F. Stream Processing with Apache Spark, O'Reilly, 2021 6. F. Hueske, V. Kalavri Stream Processing with Apache Flink, O'Reilly, 2021 7. Nandi A. "Spark for Python Developers", 2015 Literatura uzupełniająca: 1. Frątczak E., "Statistics for Management & Economics" SGH, Warszawa, 2015 2. Simon P., "Too Big to IGNORE. The Business Case for Big Data", John Wiley & Sons Inc., 2013 3. Frank J. Ohlhorst. "Big Data Analytics. Turning Big Data into Big Money". John Wiley & Sons. Inc. 2013 4. Russell J. "Zwinna analiza danych Apache Hadoop dla każdego", Helion, 2014 5. Todman C., "Projektowanie hurtowni danych, Wspomaganie zarządzania relacjami z klientami", Helion, 2011 6. P. Bruce, A. Bruce, P. Gedeck, "Statystyka praktyczna w data science. 50 kluczowych zagadnień w językach R i Python". Helion, Wydanie II, 2021 |
|
| Uwagi: |
Evaluation criteria Traditional Written Exam: 0.00% Multiple Choice Test (MS Teams + Forms): 40.00% Oral Exam: 0.00% Test (Realization on labs): 20.00% Papers/Essays (Preparing a presentation): 40.00% Other: 0.00% The threshold percentage of absences (excluding lectures) defined as the proportion of class hours beyond which the achievement of learning outcomes is deemed unattainable: 50% |
|
Zajęcia w cyklu "Semestr letni 2024/25" (zakończony)
| Okres: | 2025-02-15 - 2025-09-30 |
Przejdź do planu
PN WT LAB
LAB
LAB
WYK
ŚR CZ PT |
| Typ zajęć: |
Laboratorium, 20 godzin
Wykład, 10 godzin
|
|
| Koordynatorzy: | (brak danych) | |
| Prowadzący grup: | Szymon Chudziak, Sebastian Zając | |
| Lista studentów: | (nie masz dostępu) | |
| Zaliczenie: |
Przedmiot -
Ocena
Wykład - Ocena |
|
| Skrócony opis: |
1. From Flat Files to Data Mash: Data Processing Models in Big Data. 2. ETL and Batch (Offline Learning) and Incremental (Online Learning) Modeling. Map-Reduce. 3. Data Streams, Events, and Time and Time Window Concepts in Real-time Data Processing. 4. Microservices and Communication via REST API. 5. Contemporary Architectures for Stream Data Processing Applications - Lambda, Kappa, Pub/Sub. 6. Processing Structured and Unstructured Data. Programming Environment for Python. 7. Utilizing Python Object-Oriented Elements in the Modeling Process with Scikit-Learn and Keras. 8. Python Object-Oriented Programming Basics. Building Classes for Random Walk, Perceptron, and Adeline Algorithms. 9. Preparing a Microservice with an ML Model for Production Use. 10. Streaming Data Using RDDs with Apache Spark. Introduction to the DataFrame Object. 11. Methods for Creating Data Streams Using the DataFrame Object in Apache Spark. Setting Output and Input. 12. S |
|
| Pełny opis: |
Making informed decisions based on data and its analysis is fundamental in today's modern business world. Modern techniques such as machine learning, artificial intelligence, and deep neural networks can significantly enhance business understanding and decision-making quality. Moreover, the speed of decision-making is crucial in a dynamic business environment, especially when dealing directly with customers. The goal of these classes is to impart students with practical experience and comprehensive theoretical knowledge in real-time data processing and analysis, as well as to introduce the latest information technology for processing structured data (e.g., from data warehouses) and unstructured data (e.g., images, sound, video streaming) online. The philosophy of real-time extensive data analysis using Python programming will be presented during the classes. Software structures for data processing will be introduced, along with discussions of the issues and challenges encountered when modeling large amounts of data in real time. Theoretical knowledge will be gained through hands-on exercises using tools such as Apache Spark and Apache Kafka. In the lab sessions, students will utilize fully configured development environments prepared for data processing, modeling, and analysis, ensuring that, in addition to analytical skills and techniques, they also become familiar with and understand the latest information technology related to real-time data processing. |
|
| Literatura: |
Literatura podstawowa: 1. Zając S. "Modelowanie dla biznesu. Analityka w czasie rzeczywistym - narzędzia informatyczne i biznesowe. Oficyna Wydawnicza SGH, Warszawa 2022 2. K. Przanowski K. , Zając S. red. "Modelowanie dla biznesu, metody ML, modele portfela CF, modele rekurencyjne, analizy przeżycia, modele scoringowe, SGH, Warszawa 2020. 3. Frątczak E., red. "Modelowanie dla biznesu, Regresja logistyczna, Regresja Poissona, Survival Data Mining, CRM, Credit Scoring". SGH, Warszawa 2019. 4. S. Raschka, Python. Uczenie maszynowe. Wydanie II 5. Maas G., Garillot F. Stream Processing with Apache Spark, O'Reilly, 2021 6. F. Hueske, V. Kalavri Stream Processing with Apache Flink, O'Reilly, 2021 7. Nandi A. "Spark for Python Developers", 2015 Literatura uzupełniająca: 1. Frątczak E., "Statistics for Management & Economics" SGH, Warszawa, 2015 2. Simon P., "Too Big to IGNORE. The Business Case for Big Data", John Wiley & Sons Inc., 2013 3. Frank J. Ohlhorst. "Big Data Analytics. Turning Big Data into Big Money". John Wiley & Sons. Inc. 2013 4. Russell J. "Zwinna analiza danych Apache Hadoop dla każdego", Helion, 2014 5. Todman C., "Projektowanie hurtowni danych, Wspomaganie zarządzania relacjami z klientami", Helion, 2011 6. P. Bruce, A. Bruce, P. Gedeck, "Statystyka praktyczna w data science. 50 kluczowych zagadnień w językach R i Python". Helion, Wydanie II, 2021 |
|
Zajęcia w cyklu "Semestr zimowy 2024/25" (zakończony)
| Okres: | 2024-10-01 - 2025-02-14 |
Przejdź do planu
PN WT ŚR CZ PT |
| Typ zajęć: |
Laboratorium, 20 godzin
Wykład, 10 godzin
|
|
| Koordynatorzy: | (brak danych) | |
| Prowadzący grup: | (brak danych) | |
| Lista studentów: | (nie masz dostępu) | |
| Zaliczenie: |
Przedmiot -
Ocena
Wykład - Ocena |
|
| Skrócony opis: |
1. From Flat Files to Data Mash: Data Processing Models in Big Data. 2. ETL and Batch (Offline Learning) and Incremental (Online Learning) Modeling. Map-Reduce. 3. Data Streams, Events, and Time and Time Window Concepts in Real-time Data Processing. 4. Microservices and Communication via REST API. 5. Contemporary Architectures for Stream Data Processing Applications - Lambda, Kappa, Pub/Sub. 6. Processing Structured and Unstructured Data. Programming Environment for Python. 7. Utilizing Python Object-Oriented Elements in the Modeling Process with Scikit-Learn and Keras. 8. Python Object-Oriented Programming Basics. Building Classes for Random Walk, Perceptron, and Adeline Algorithms. 9. Preparing a Microservice with an ML Model for Production Use. 10. Streaming Data Using RDDs with Apache Spark. Introduction to the DataFrame Object. 11. Methods for Creating Data Streams Using the DataFrame Object in Apache Spark. Setting Output and Input. 12. S |
|
| Pełny opis: |
Making informed decisions based on data and its analysis is fundamental in today's modern business world. Modern techniques such as machine learning, artificial intelligence, and deep neural networks can significantly enhance business understanding and decision-making quality. Moreover, the speed of decision-making is crucial in a dynamic business environment, especially when dealing directly with customers. The goal of these classes is to impart students with practical experience and comprehensive theoretical knowledge in real-time data processing and analysis, as well as to introduce the latest information technology for processing structured data (e.g., from data warehouses) and unstructured data (e.g., images, sound, video streaming) online. The philosophy of real-time extensive data analysis using Python programming will be presented during the classes. Software structures for data processing will be introduced, along with discussions of the issues and challenges encountered when modeling large amounts of data in real time. Theoretical knowledge will be gained through hands-on exercises using tools such as Apache Spark and Apache Kafka. In the lab sessions, students will utilize fully configured development environments prepared for data processing, modeling, and analysis, ensuring that, in addition to analytical skills and techniques, they also become familiar with and understand the latest information technology related to real-time data processing. |
|
| Literatura: |
Literatura podstawowa: 1. Zając S. "Modelowanie dla biznesu. Analityka w czasie rzeczywistym - narzędzia informatyczne i biznesowe. Oficyna Wydawnicza SGH, Warszawa 2022 2. K. Przanowski K. , Zając S. red. "Modelowanie dla biznesu, metody ML, modele portfela CF, modele rekurencyjne, analizy przeżycia, modele scoringowe, SGH, Warszawa 2020. 3. Frątczak E., red. "Modelowanie dla biznesu, Regresja logistyczna, Regresja Poissona, Survival Data Mining, CRM, Credit Scoring". SGH, Warszawa 2019. 4. S. Raschka, Python. Uczenie maszynowe. Wydanie II 5. Maas G., Garillot F. Stream Processing with Apache Spark, O'Reilly, 2021 6. F. Hueske, V. Kalavri Stream Processing with Apache Flink, O'Reilly, 2021 7. Nandi A. "Spark for Python Developers", 2015 Literatura uzupełniająca: 1. Frątczak E., "Statistics for Management & Economics" SGH, Warszawa, 2015 2. Simon P., "Too Big to IGNORE. The Business Case for Big Data", John Wiley & Sons Inc., 2013 3. Frank J. Ohlhorst. "Big Data Analytics. Turning Big Data into Big Money". John Wiley & Sons. Inc. 2013 4. Russell J. "Zwinna analiza danych Apache Hadoop dla każdego", Helion, 2014 5. Todman C., "Projektowanie hurtowni danych, Wspomaganie zarządzania relacjami z klientami", Helion, 2011 6. P. Bruce, A. Bruce, P. Gedeck, "Statystyka praktyczna w data science. 50 kluczowych zagadnień w językach R i Python". Helion, Wydanie II, 2021 |
|
Właścicielem praw autorskich jest Szkoła Główna Handlowa w Warszawie.
