Szkoła Główna Handlowa w Warszawie - Centralny System Uwierzytelniania
Strona główna

Data Mining 223121-D
Laboratorium (LAB) Semestr letni 2021/22

Informacje o zajęciach (wspólne dla wszystkich grup)

Liczba godzin: 30
Limit miejsc: (brak limitu)
Zaliczenie: Ocena
Zakres tematów:

Introduction to data mining; data mining philosophy and SEMMA methodology; analysing a data set in SAS Enterprise Miner; SAS Enterprise Miner interface, creating a project, creating a data source; defining a diagram; data partitioning in data mining; exploring the data set; sampling.

Dimensionality reduction and feature selection methods: principal component analysis; kernel principal component analysis; factor analysis.

Data imputation and instance selection methods: missing attribute values - the reasons and the role of randomness in data incompleteness.

The processing of incomplete data sets; data imputation methods; the applications of instance selection and selected instance selection methods.

Date pre-processing: dimensionality reduction and instance selection: dimensionality reduction in exploring the data sets; principal components analysis; instance selection methods.

Predictive modelling - decision trees and random forests: process of building decision tree models; splitting rules; pruning methods; advantages and disadvantages of decision tree models; random forests.

Decision tree models in SAS Enterprise Miner: selected aspects of decision tree models in SAS Enterprise Miner; process of building decision tree models; model selection; key aspects of random forests.

Neural network models - multilayer perceptrons: neuron model and feed forward neural networks; network training process; the pros and cons of neural networks

Neural network models in SAS Enterprise Miner: model construction - architecture selection; network training; the limitations of gradient-based training.

The analysis of classification models: accuracy evaluation of classification models; methods used for one and multiple classes; assessment of statistical significance of performance indicators; graphical methods of classification model evaluation.

Naive Bayes estimation and Bayesian networks: maximum a posteriori classification; na've Bayes classification.

Pattern recognition: self-organizing networks; cluster analysis in data mining; transaction data modelling; association and sequence analysis.

Methods of High Performance analytics. Effectiveness, speed and efficiency.

Realisation of examples and practical applications.

Selected methods of data mining using R package

Grupy zajęciowe

zobacz na planie zajęć

Grupa Termin(y) Prowadzący Akcje
Opisy przedmiotów w USOS i USOSweb są chronione prawem autorskim.
Właścicielem praw autorskich jest Szkoła Główna Handlowa w Warszawie.
al. Niepodległości 162
02-554 Warszawa
tel: +48 22 564 60 00 http://www.sgh.waw.pl/
kontakt deklaracja dostępności USOSweb 7.0.2.0