Lectures (topics): 1. Analysis of current data sources, data types and their processing, data archiving. Data import and export. 2. The process of mining data from large data structures, CRISP-DM methodology. 3. Preparing data, understanding data, description of data sets, and preparation of data matrices, choosing and scouring data, construction and merging of data sources, type homogeneity, formatting and common transformations of data. 4. Categorical data versus numerical data, their use in DM algorithms, categorisation, the method of optimal categorisation, the solution for missing values, multiple imputations, dependencies inside data, reduction of dimensionality. 5. Machine learning and data mining methods. 6. Building decision-making and classification trees. 7. Algorithms of searching for association rules in large data structures. 8. Neural networks, genetic algorithms, particle swarm optimisation - procedures inspired by nature. 9. Teaching without a teacher- the fundamental principles of cluster analysis, the significance of similarities and anomalies in data. 10. Advanced methods of model evaluation. Seminars (topics): Within the scope of the seminars, the students will be acquainted with chosen software tools for finding hidden information, knowledge and behavioural patterns in data of various types to support decision-making. "Knowledge" refers to generalised information presented through e.g. discovered rules. The students will be working with large sets of various real numerical and non-numerical data, with data created through enterprise management, during the management of the operation of production technologies, with experimental data, customer data, client data and other. They will be solving the assignments in the IBM SPSS modeler environment and other open-source data mining tools, following the lectures. The application of DM procedures and algorithms will be discussed and studied through a wide array of assignments, e. g., marketing campaign targeting, customers churn, monitoring of test operation, prediction of machine failures etc.
|