Lecturer(s)
|
-
Tyl Pavel, Ing.
-
Lamr Marián, Ing. Ph.D.
|
Course content
|
Lectures: 1. Data, data types and its processing; data gathering, databases and data warehouses, data import and export 2. Process of knowledge acquisition from vast data structures, course aims overview, methodology overview 3. Types of datamining tasks, a survey on representative datamining tasks 4. Data preprocessing and data understanding, data sets description, preparation of data matrix, selection of data and data refinement process; construction and merging of data sources, type homogeneity 5. Algorithms for classification as a tool for prediction (based on historical data), Decision trees, C&RT algorithms, C5.0, CHAID&QUEST. Tree to rules transformation, tree reduction process 6. Process of discrimination analysis, classification of use-cases into classes, scoring. 7. Revelation of important data structure by segmentation algorithms, K-Means, Two Step, Anomaly algorithms for data clustering. 8. Association algorithms for searching for association rules, Apriori and Carma models, statistics, prediction model 9.- 10. Basic introduction into neural networks for processing of ranked and numerical variables, their usage in the case of classical linear methods failure 10. 14. Modelling and evaluation of solutions, introduction to DM solution; embedding of scoring processes into company decision workflow. Analysis of support of up to date computer tools and their possible future evolution Seminars: 1.2. Processing and visualization of data in SPSS Modeler; comparison with other Open source SW 3.9. Preparation of models for selected use-cases, their analysis and interpretation of results followed by slight modifications of the use-cases. Discussion and application of DM on the following types of tasks: recommendation of medication, classification of biological and physical data, monitoring of instrument trial operation and its potential failure prediction. 9.12. Seminar projects 13.14. Defense of seminar projects
|
Learning activities and teaching methods
|
Monological explanation (lecture, presentation,briefing), Self-study (text study, reading, problematic tasks, practical tasks, experiments, research, written assignments), Demonstration of student skills
- Home preparation for classes
- 56 hours per semester
- Semestral paper
- 20 hours per semester
|
Learning outcomes
|
The main contribution of the curse is to show students how to solve complex decision support tasks especially in medicine but also in the economy topics. Students will become familiar with the process of information and knowledge retrieval from complex real-world numerical as well as non-numerical data sets. Nowadays data mining software tools will be used throughout the curse.
Intermediate understanding of datamining tasks, abilities to work with contemporary software dataminig tools (e.g. IBM SPSS Modeler), team cooperation on complex projects.
|
Prerequisites
|
Statistics, basic
|
Assessment methods and criteria
|
Combined examination
Activity on the seminars and successful passing the tests are required for getting a credit.
|
Recommended literature
|
-
Berka Petr. Dobývání znalostí z databazí. Praha, 2006.
-
Hendl J. Přehled statistických metod zpracování dat. Praha, 2006.
-
Kotler Philip. Marketing management. Praha, 2005.
-
Olivia Parr Rud. Datamining. Praha, 2006.
|