Course: Data analysis and knowledge mining

» List of faculties » FM » MTI
Course title Data analysis and knowledge mining
Course code MTI/DAKM
Organizational form of instruction Lecture + Lesson
Level of course Master
Year of study not specified
Semester Summer
Number of ECTS credits 5
Language of instruction English
Status of course Compulsory-optional
Form of instruction Face-to-face
Work placements Course does not contain work placement
Recommended optional programme components None
Lecturer(s)
  • Lamr Marián, Ing. Ph.D.
Course content
Lectures (topics): 1. Analysis of current data sources, data types and their processing, data archiving. Data import and export. 2. The process of mining data from large data structures, CRISP-DM methodology. 3. Preparing data, understanding data, description of data sets, and preparation of data matrices, choosing and scouring data, construction and merging of data sources, type homogeneity, formatting and common transformations of data. 4. Categorical data versus numerical data, their use in DM algorithms, categorisation, the method of optimal categorisation, the solution for missing values, multiple imputations, dependencies inside data, reduction of dimensionality. 5. Machine learning and data mining methods. 6. Building decision-making and classification trees. 7. Algorithms of searching for association rules in large data structures. 8. Neural networks, genetic algorithms, particle swarm optimisation - procedures inspired by nature. 9. Teaching without a teacher- the fundamental principles of cluster analysis, the significance of similarities and anomalies in data. 10. Advanced methods of model evaluation. Seminars (topics): Within the scope of the seminars, the students will be acquainted with chosen software tools for finding hidden information, knowledge and behavioural patterns in data of various types to support decision-making. "Knowledge" refers to generalised information presented through e.g. discovered rules. The students will be working with large sets of various real numerical and non-numerical data, with data created through enterprise management, during the management of the operation of production technologies, with experimental data, customer data, client data and other. They will be solving the assignments in the IBM SPSS modeler environment and other open-source data mining tools, following the lectures. The application of DM procedures and algorithms will be discussed and studied through a wide array of assignments, e. g., marketing campaign targeting, customers churn, monitoring of test operation, prediction of machine failures etc.

Learning activities and teaching methods
Lecture, Practicum
  • Class attendance - 56 hours per semester
  • Preparation for exam - 44 hours per semester
  • Preparation for credit - 20 hours per semester
  • Home preparation for classes - 30 hours per semester
Learning outcomes
The goal of the subject is to acquaint the students with the matter of making decisions based on the knowledge acquired from different types of data sources, especially through the analysis of large volumes of data. The individual steps of the process of acquiring the knowledge will be demonstrated on practical assignments. The students will be acquainted with the techniques, tools and algorithms used in this process.

Prerequisites
unspecified

Assessment methods and criteria
Combined examination

Recommended literature
  • WITTEN, I. H. and Frank EIBE. Data mining: practical machine learning tools and techniques. Cambridge: Morgan Kaufmann, 2017. ISBN 9780128042915.


Study plans that include the course
Faculty Study plan (Version) Category of Branch/Specialization Recommended year of study Recommended semester