Introduction to data mining; data mining philosophy and SEMMA methodology; analysing a data set in SAS Enterprise Miner; SAS Enterprise Miner interface, creating a project, creating a data source; defining a diagram; data partitioning in data mining; exploring the data set; sampling.
Dimensionality reduction and feature selection methods: principal component analysis; kernel principal component analysis; factor analysis.
Data imputation and instance selection methods: missing attribute values - the reasons and the role of randomness in data incompleteness.
The processing of incomplete data sets; data imputation methods; the applications of instance selection and selected instance selection methods.
Date pre-processing: dimensionality reduction and instance selection: dimensionality reduction in exploring the data sets; principal components analysis; instance selection methods.
Predictive modelling - regression: classification based on logistic regression; prediction based on linear regression.
Predictive modelling - decision trees and random forests: process of building decision tree models; splitting rules; pruning methods; advantages and disadvantages of decision tree models; random forests. Decision tree models in SAS Enterprise Miner: selected aspects of decision tree models and random forests in SAS Enterprise Miner.
Neural network models - multilayer perceptrons: neuron model and feed forward neural networks; network training process; the pros and cons of neural networks
Neural network models in SAS Enterprise Miner: model construction - architecture selection; network training; the limitations of gradient-based training.
The analysis of classification models: accuracy evaluation of classification models; methods used for one and multiple classes; assessment of statistical significance of performance indicators; graphical methods of classification model evaluation.
Naive Bayes estimation and Bayesian networks: maximum a posteriori classification; na've Bayes classification.
Pattern recognition: self-organizing networks; cluster analysis in data mining; transaction data modelling; association and sequence analysis.
Methods of High Performance analytics. Effectiveness, speed and efficiency.
Realisation of examples and practical applications.
Selected methods of data mining using R package
|