Real-Time Analytics
Informacje ogólne
Kod przedmiotu: | 222891-D |
Kod Erasmus / ISCED: | (brak danych) / (brak danych) |
Nazwa przedmiotu: | Real-Time Analytics |
Jednostka: | Szkoła Główna Handlowa w Warszawie |
Grupy: |
Elective courses for QEM - masters Major courses for AAB - masters Przedmioty obowiązkowe na programie SMMD-ADA |
Punkty ECTS i inne: |
3.00 (zmienne w czasie)
|
Język prowadzenia: | angielski |
Efekty uczenia się: |
Wiedza: Know the history and philosophy of data processing models Know the types of structured and unstructured data Know the possibilities and areas of real-time data processing Know the theoretical aspects of REST API and pub/sub Be able to choose the IT structure for a given business problem Understand the business needs of making decisions in a very short time Umiejętności: Distinguish between structured and unstructured data types Be able to prepare, process and save data generated in real time Understand the limitations arising from time processing by devices and IT systems Apply and construct a system for real-time processing Be able to prepare reporting for real-time processing system Kompetencje społeczne: Formulate an analytical problem along with its IT solution Consolidate the ability of independent supplementing theoretical and practical knowledge in the field of programming, modelling, new information technologies using real-time analysis. |
Zajęcia w cyklu "Preferencje - Semestr letni 2024/25" (jeszcze nie rozpoczęty)
Okres: | 2025-02-15 - 2025-09-30 |
Przejdź do planu
PN WT ŚR CZ PT |
Typ zajęć: |
Zajęcia prowadzącego
|
|
Koordynatorzy: | (brak danych) | |
Prowadzący grup: | Sebastian Zając | |
Lista studentów: | (nie masz dostępu) | |
Zaliczenie: |
Przedmiot -
Ocena
Zajęcia prowadzącego - Ocena |
Zajęcia w cyklu "Semestr letni 2024/25" (jeszcze nie rozpoczęty)
Okres: | 2025-02-15 - 2025-09-30 |
Przejdź do planu
PN WT LAB
LAB
LAB
WYK
ŚR CZ PT |
Typ zajęć: |
Laboratorium, 20 godzin
Wykład, 10 godzin
|
|
Koordynatorzy: | (brak danych) | |
Prowadzący grup: | Szymon Chudziak, Sebastian Zając | |
Lista studentów: | (nie masz dostępu) | |
Zaliczenie: |
Przedmiot -
Ocena
Wykład - Ocena |
|
Skrócony opis: |
1. From Flat Files to Data Mash: Data Processing Models in Big Data. 2. ETL and Batch (Offline Learning) and Incremental (Online Learning) Modeling. Map-Reduce. 3. Data Streams, Events, and Time and Time Window Concepts in Real-time Data Processing. 4. Microservices and Communication via REST API. 5. Contemporary Architectures for Stream Data Processing Applications - Lambda, Kappa, Pub/Sub. 6. Processing Structured and Unstructured Data. Programming Environment for Python. 7. Utilizing Python Object-Oriented Elements in the Modeling Process with Scikit-Learn and Keras. 8. Python Object-Oriented Programming Basics. Building Classes for Random Walk, Perceptron, and Adeline Algorithms. 9. Preparing a Microservice with an ML Model for Production Use. 10. Streaming Data Using RDDs with Apache Spark. Introduction to the DataFrame Object. 11. Methods for Creating Data Streams Using the DataFrame Object in Apache Spark. Setting Output and Input. 12. S |
|
Pełny opis: |
Making informed decisions based on data and its analysis is fundamental in today's modern business world. Modern techniques such as machine learning, artificial intelligence, and deep neural networks can significantly enhance business understanding and decision-making quality. Moreover, the speed of decision-making is crucial in a dynamic business environment, especially when dealing directly with customers. The goal of these classes is to impart students with practical experience and comprehensive theoretical knowledge in real-time data processing and analysis, as well as to introduce the latest information technology for processing structured data (e.g., from data warehouses) and unstructured data (e.g., images, sound, video streaming) online. The philosophy of real-time extensive data analysis using Python programming will be presented during the classes. Software structures for data processing will be introduced, along with discussions of the issues and challenges encountered when modeling large amounts of data in real time. Theoretical knowledge will be gained through hands-on exercises using tools such as Apache Spark and Apache Kafka. In the lab sessions, students will utilize fully configured development environments prepared for data processing, modeling, and analysis, ensuring that, in addition to analytical skills and techniques, they also become familiar with and understand the latest information technology related to real-time data processing. |
|
Literatura: |
Literatura podstawowa: 1. Zając S. "Modelowanie dla biznesu. Analityka w czasie rzeczywistym - narzędzia informatyczne i biznesowe. Oficyna Wydawnicza SGH, Warszawa 2022 2. K. Przanowski K. , Zając S. red. "Modelowanie dla biznesu, metody ML, modele portfela CF, modele rekurencyjne, analizy przeżycia, modele scoringowe, SGH, Warszawa 2020. 3. Frątczak E., red. "Modelowanie dla biznesu, Regresja logistyczna, Regresja Poissona, Survival Data Mining, CRM, Credit Scoring". SGH, Warszawa 2019. 4. S. Raschka, Python. Uczenie maszynowe. Wydanie II 5. Maas G., Garillot F. Stream Processing with Apache Spark, O'Reilly, 2021 6. F. Hueske, V. Kalavri Stream Processing with Apache Flink, O'Reilly, 2021 7. Nandi A. "Spark for Python Developers", 2015 Literatura uzupełniająca: 1. Frątczak E., "Statistics for Management & Economics" SGH, Warszawa, 2015 2. Simon P., "Too Big to IGNORE. The Business Case for Big Data", John Wiley & Sons Inc., 2013 3. Frank J. Ohlhorst. "Big Data Analytics. Turning Big Data into Big Money". John Wiley & Sons. Inc. 2013 4. Russell J. "Zwinna analiza danych Apache Hadoop dla każdego", Helion, 2014 5. Todman C., "Projektowanie hurtowni danych, Wspomaganie zarządzania relacjami z klientami", Helion, 2011 6. P. Bruce, A. Bruce, P. Gedeck, "Statystyka praktyczna w data science. 50 kluczowych zagadnień w językach R i Python". Helion, Wydanie II, 2021 |
|
Uwagi: |
Kryteria oceniania: egzamin tradycyjny-pisemny: 0.00% egzamin testowy: 40.00% egzamin ustny: 0.00% kolokwium: 20.00% referaty/eseje: 40.00% ocena z ćwiczeń: 0.00% inne: 0.00% |
Zajęcia w cyklu "Semestr zimowy 2024/25" (w trakcie)
Okres: | 2024-10-01 - 2025-02-14 |
Przejdź do planu
PN WT ŚR CZ PT |
Typ zajęć: |
Laboratorium, 20 godzin
Wykład, 10 godzin
|
|
Koordynatorzy: | (brak danych) | |
Prowadzący grup: | (brak danych) | |
Lista studentów: | (nie masz dostępu) | |
Zaliczenie: |
Przedmiot -
Ocena
Wykład - Ocena |
|
Skrócony opis: |
1. From Flat Files to Data Mash: Data Processing Models in Big Data. 2. ETL and Batch (Offline Learning) and Incremental (Online Learning) Modeling. Map-Reduce. 3. Data Streams, Events, and Time and Time Window Concepts in Real-time Data Processing. 4. Microservices and Communication via REST API. 5. Contemporary Architectures for Stream Data Processing Applications - Lambda, Kappa, Pub/Sub. 6. Processing Structured and Unstructured Data. Programming Environment for Python. 7. Utilizing Python Object-Oriented Elements in the Modeling Process with Scikit-Learn and Keras. 8. Python Object-Oriented Programming Basics. Building Classes for Random Walk, Perceptron, and Adeline Algorithms. 9. Preparing a Microservice with an ML Model for Production Use. 10. Streaming Data Using RDDs with Apache Spark. Introduction to the DataFrame Object. 11. Methods for Creating Data Streams Using the DataFrame Object in Apache Spark. Setting Output and Input. 12. S |
|
Pełny opis: |
Making informed decisions based on data and its analysis is fundamental in today's modern business world. Modern techniques such as machine learning, artificial intelligence, and deep neural networks can significantly enhance business understanding and decision-making quality. Moreover, the speed of decision-making is crucial in a dynamic business environment, especially when dealing directly with customers. The goal of these classes is to impart students with practical experience and comprehensive theoretical knowledge in real-time data processing and analysis, as well as to introduce the latest information technology for processing structured data (e.g., from data warehouses) and unstructured data (e.g., images, sound, video streaming) online. The philosophy of real-time extensive data analysis using Python programming will be presented during the classes. Software structures for data processing will be introduced, along with discussions of the issues and challenges encountered when modeling large amounts of data in real time. Theoretical knowledge will be gained through hands-on exercises using tools such as Apache Spark and Apache Kafka. In the lab sessions, students will utilize fully configured development environments prepared for data processing, modeling, and analysis, ensuring that, in addition to analytical skills and techniques, they also become familiar with and understand the latest information technology related to real-time data processing. |
|
Literatura: |
Literatura podstawowa: 1. Zając S. "Modelowanie dla biznesu. Analityka w czasie rzeczywistym - narzędzia informatyczne i biznesowe. Oficyna Wydawnicza SGH, Warszawa 2022 2. K. Przanowski K. , Zając S. red. "Modelowanie dla biznesu, metody ML, modele portfela CF, modele rekurencyjne, analizy przeżycia, modele scoringowe, SGH, Warszawa 2020. 3. Frątczak E., red. "Modelowanie dla biznesu, Regresja logistyczna, Regresja Poissona, Survival Data Mining, CRM, Credit Scoring". SGH, Warszawa 2019. 4. S. Raschka, Python. Uczenie maszynowe. Wydanie II 5. Maas G., Garillot F. Stream Processing with Apache Spark, O'Reilly, 2021 6. F. Hueske, V. Kalavri Stream Processing with Apache Flink, O'Reilly, 2021 7. Nandi A. "Spark for Python Developers", 2015 Literatura uzupełniająca: 1. Frątczak E., "Statistics for Management & Economics" SGH, Warszawa, 2015 2. Simon P., "Too Big to IGNORE. The Business Case for Big Data", John Wiley & Sons Inc., 2013 3. Frank J. Ohlhorst. "Big Data Analytics. Turning Big Data into Big Money". John Wiley & Sons. Inc. 2013 4. Russell J. "Zwinna analiza danych Apache Hadoop dla każdego", Helion, 2014 5. Todman C., "Projektowanie hurtowni danych, Wspomaganie zarządzania relacjami z klientami", Helion, 2011 6. P. Bruce, A. Bruce, P. Gedeck, "Statystyka praktyczna w data science. 50 kluczowych zagadnień w językach R i Python". Helion, Wydanie II, 2021 |
|
Uwagi: |
Kryteria oceniania: egzamin tradycyjny-pisemny: 0.00% egzamin testowy: 40.00% egzamin ustny: 0.00% kolokwium: 20.00% referaty/eseje: 40.00% ocena z ćwiczeń: 0.00% inne: 0.00% |
Zajęcia w cyklu "Semestr letni 2023/24" (zakończony)
Okres: | 2024-02-24 - 2024-09-30 |
Przejdź do planu
PN WYK
WT LAB
ŚR CZ PT |
Typ zajęć: |
Laboratorium, 20 godzin
Wykład, 10 godzin
|
|
Koordynatorzy: | (brak danych) | |
Prowadzący grup: | Sebastian Zając | |
Lista studentów: | (nie masz dostępu) | |
Zaliczenie: |
Przedmiot -
Ocena
Wykład - Ocena |
|
Skrócony opis: |
1. Modelling, learning and prediction in batch mode (offline learning) and incremental (online learning) modes. Problems of incremental machine learning. 2. Data processing models in Big Data. From flat files to Data Lake. Real-time data myth and facts 3. NRT systems (near real-time systems), data acquisition, streaming and analytics. 4. Algorithms for estimating model parameters in incremental mode. Stochastic Gradient Descent 5. Modern streaming application architectures 6. Preparation of the microservice with the ML model for prediction use. 7. Processing structured and unstructured data in Python. Function and Object-oriented connection to RDB and NoSQL 8. Aggregations and reporting in NoSQL databases (MongoDB) 9. Basics of object-oriented programming in Python in linear and logistic regression, neural network analysis using the sklearn, TensorFlow and Keras. 10. IT streaming architecture. Apache Spark and Jupyter notebook environment using docker tool. Analysis of data |
|
Pełny opis: |
Making the right decisions based on data and their analysis in business is a process and daily. Modern methods of modelling by machine learning (ML), artificial intelligence (AI), or deep learning not only allow better understanding of business, but also support making key decisions for it. The development of technology and increasingly new business concepts of working directly with the client require not only correct but also fast decisions. The classes offered are designed to provide students with experience and comprehensive theoretical knowledge in the field of real-time data processing and analysis, and to present the latest technologies (free and commercial) for the processing of structured data (originating e.g. from data warehouses) and unstructured (e.g. images, sound, video streaming) in on-line mode. The course will present the so called lambda and kappa structures for data processing into data lake along with a discussion of the problems and difficulties encountered in implementing real-time modelling for large amounts of data. Theoretical knowledge will be gained (apart from the lecture part) through the implementation of test cases in tools such as Apache Spark, Nifi, Microsoft Azure and SAS. During laboratory classes student will benefit from fully understand the latest information technologies related to real-time data processing. |
|
Literatura: |
Literatura podstawowa: 1. Frątczak E., red. "Modelowanie dla biznesu, Regresja logistyczna, Regresja Poissona, Survival Data Mining, CRM, Credit Scoring". SGH, Warszawa 2019. 2. Frątczak E., red., "Zaawansowane metody analiz statystycznych", Oficyna Wydawnicza SGH, Warszawa 2012. 3. Rubach P., Zając S., Jastrzebski B., Sulkowska J.I. , Sulkowski P., "Genus for biomolecules", Web Server, Nucleic Acids Research, 2019. 4. Zając S., Piotrowski E. W., Sładkowski J., Syska J., "The method of the likelihood and the Fisher information in the construction of physical models" Phys. Status Solidi B vol. 246 No 5 (2009) 5. Indest A., Wild Knowledge. Outthik the Revolution. LID publishing.com 2017. 6. Real Time Analytic. "The Key to Unlocking Customer Insights & Driving the Customer Experience". Harvard Business Review Analytics Series, Harvard Business School Publishing, 2018. 7. Svolba G., "Applying Data Science. Business Case Studies Using SAS". SAS Institute Inc., Cary NC, USA, 2017. 8. Ellis B. "Real-Time Analytics Techniques to Analyze and Visualize Streaming data." , Wiley, 2014 9. Familiar B., Barnes J. "Business in Real-Time Using Azure IoT and Cortana Intelligence Suite" Apress, 2017 Literatura uzupełniająca: 1. Frątczak E., "Statistics for Management & Economics" SGH, Warszawa, 2015 2. Simon P., "Too Big to IGNORE. The Business Case for Big Data", John Wiley & Sons Inc., 2013 3. Nandi A. "Spark for Python Developers", 2015 4. Frank J. Ohlhorst. "Big Data Analytics. Turning Big Data into Big Money". John Wiley & Sons. Inc. 2013 5. Russell J. "Zwinna analiza danych Apache Hadoop dla każdego", Helion, 2014 6. Todman C., "Projektowanie hurtowni danych, Wspomaganie zarządzania relacjami z klientami", Helion, 2011 |
|
Uwagi: |
Kryteria oceniania: egzamin tradycyjny-pisemny: 0.00% egzamin testowy: 40.00% egzamin ustny: 0.00% kolokwium: 20.00% referaty/eseje: 40.00% ocena z ćwiczeń: 0.00% inne: 0.00% |
Zajęcia w cyklu "Semestr zimowy 2023/24" (zakończony)
Okres: | 2023-10-01 - 2024-02-23 |
Przejdź do planu
PN WT ŚR CZ PT |
Typ zajęć: |
Laboratorium, 20 godzin
Wykład, 10 godzin
|
|
Koordynatorzy: | (brak danych) | |
Prowadzący grup: | (brak danych) | |
Lista studentów: | (nie masz dostępu) | |
Zaliczenie: |
Przedmiot -
Ocena
Wykład - Ocena |
|
Skrócony opis: |
1. Modelling, learning and prediction in batch mode (offline learning) and incremental (online learning) modes. Problems of incremental machine learning. 2. Data processing models in Big Data. From flat files to Data Lake. Real-time data myth and facts 3. NRT systems (near real-time systems), data acquisition, streaming and analytics. 4. Algorithms for estimating model parameters in incremental mode. Stochastic Gradient Descent 5. Modern streaming application architectures 6. Preparation of the microservice with the ML model for prediction use. 7. Processing structured and unstructured data in Python. Function and Object-oriented connection to RDB and NoSQL 8. Aggregations and reporting in NoSQL databases (MongoDB) 9. Basics of object-oriented programming in Python in linear and logistic regression, neural network analysis using the sklearn, TensorFlow and Keras. 10. IT streaming architecture. Apache Spark and Jupyter notebook environment using docker tool. Analysis of data |
|
Pełny opis: |
Making the right decisions based on data and their analysis in business is a process and daily. Modern methods of modelling by machine learning (ML), artificial intelligence (AI), or deep learning not only allow better understanding of business, but also support making key decisions for it. The development of technology and increasingly new business concepts of working directly with the client require not only correct but also fast decisions. The classes offered are designed to provide students with experience and comprehensive theoretical knowledge in the field of real-time data processing and analysis, and to present the latest technologies (free and commercial) for the processing of structured data (originating e.g. from data warehouses) and unstructured (e.g. images, sound, video streaming) in on-line mode. The course will present the so called lambda and kappa structures for data processing into data lake along with a discussion of the problems and difficulties encountered in implementing real-time modelling for large amounts of data. Theoretical knowledge will be gained (apart from the lecture part) through the implementation of test cases in tools such as Apache Spark, Nifi, Microsoft Azure and SAS. During laboratory classes student will benefit from fully understand the latest information technologies related to real-time data processing. |
|
Literatura: |
Literatura podstawowa: 1. Frątczak E., red. "Modelowanie dla biznesu, Regresja logistyczna, Regresja Poissona, Survival Data Mining, CRM, Credit Scoring". SGH, Warszawa 2019. 2. Frątczak E., red., "Zaawansowane metody analiz statystycznych", Oficyna Wydawnicza SGH, Warszawa 2012. 3. Rubach P., Zając S., Jastrzebski B., Sulkowska J.I. , Sulkowski P., "Genus for biomolecules", Web Server, Nucleic Acids Research, 2019. 4. Zając S., Piotrowski E. W., Sładkowski J., Syska J., "The method of the likelihood and the Fisher information in the construction of physical models" Phys. Status Solidi B vol. 246 No 5 (2009) 5. Indest A., Wild Knowledge. Outthik the Revolution. LID publishing.com 2017. 6. Real Time Analytic. "The Key to Unlocking Customer Insights & Driving the Customer Experience". Harvard Business Review Analytics Series, Harvard Business School Publishing, 2018. 7. Svolba G., "Applying Data Science. Business Case Studies Using SAS". SAS Institute Inc., Cary NC, USA, 2017. 8. Ellis B. "Real-Time Analytics Techniques to Analyze and Visualize Streaming data." , Wiley, 2014 9. Familiar B., Barnes J. "Business in Real-Time Using Azure IoT and Cortana Intelligence Suite" Apress, 2017 Literatura uzupełniająca: 1. Frątczak E., "Statistics for Management & Economics" SGH, Warszawa, 2015 2. Simon P., "Too Big to IGNORE. The Business Case for Big Data", John Wiley & Sons Inc., 2013 3. Nandi A. "Spark for Python Developers", 2015 4. Frank J. Ohlhorst. "Big Data Analytics. Turning Big Data into Big Money". John Wiley & Sons. Inc. 2013 5. Russell J. "Zwinna analiza danych Apache Hadoop dla każdego", Helion, 2014 6. Todman C., "Projektowanie hurtowni danych, Wspomaganie zarządzania relacjami z klientami", Helion, 2011 |
|
Uwagi: |
Kryteria oceniania: egzamin tradycyjny-pisemny: 0.00% egzamin testowy: 40.00% egzamin ustny: 0.00% kolokwium: 20.00% referaty/eseje: 40.00% ocena z ćwiczeń: 0.00% inne: 0.00% |
Właścicielem praw autorskich jest Szkoła Główna Handlowa w Warszawie.