Title Data Mining
Lesson Code 321-9250
Semester 8
ECTS 5
Hours (Theory) 3
Hours (Lab) 2
Faculty Kostoulas Theodoros

Syllabus

  1. Introduction to Data Mining Techniques: (a) data, (b) problems, (c) applications, (d) general analysis and processing techniques.
  2. Data pre-processing: (a) data cleansing, (b) data transformations, (c) dimension reduction techniques.
  3. Clustering, Part I: (a) introduction to clustering, (b) proximity measures, (c) k-means and its variations, (d) hierarchical clustering.
  4. Clustering, Part II: (a) DBSCAN, (b) cluster validity, (c) BIRCH.
  5. Association Rules I: (a) problem definition, (b) a-priori algorithm, (c) frequent itemsets.
  6. Association Rules II: (a) advanced methods for finding frequent itemsets, (b) FP-Growth, (c) association rules validation.
  7. Classification I: (a) introduction, (b) Decision Trees (entropy, Gini Index, classification error).
  8. Classification II: (a) Bayesian classifiers, (b) Support Vector Machines, (c) KNN, (d) rule-based classifiers, (e) overfitting.
  9. Data mining and Multimodal Data

Learning Outcomes

On completion of this module, students are expected to be able:

  • To have the knowledge of explaining the Critical awareness of current problems and research issues in Data Mining. To have the knowledge of comprehensive understanding of current advanced scholarship and research in data mining and how this may contribute to the effective design and implementation of data mining applications.
  • To have the ability to consistently apply knowledge concerning current data mining research issues in an original manner and produce work which is at the forefront of current developments in the sub-discipline of data mining.
  • Developing their proficiency with leading data mining software, including RapidMiner, Weka and Business Intelligence of MS SQL server. Understanding of how to apply a wide range of clustering, estimation, prediction and classification algorithms, including k-means clustering, BIRCH clustering, DBSCAN clustering, classification and regression trees, the C4.5 algorithm, logistic Regression, k-nearest neighbor, multiple regression, neural networks and support vector machines.
  • To possess the capacity for understanding how to apply the most current data mining techniques and applications, such as text mining, mining genomics data, and other current issues. Understanding of the mathematical/statistics foundations of the algorithms outlined above.

Prerequisite Courses

Not required.

Basic Textbooks

1. Data Mining-Introductory and Advanced Topics, Margaret H. Dunham, Pearson Education, ISBN: 9780130888921,2002.
2. Data Mining, A Knowledge Discovery Approach, Krzysztof J. Cios et al., Springer Verlag, ISBN: 9780387333335, 2007.

Additional References

1. Data Mining-Foundations and Practice, Lin, Xie, Wasilewska and Liau, Springer-Verlag Berlin and Heidelberg GmbH & Co. KG, ISBN10: 354078487X, 2008.

Teaching and Learning Methods

Activity Semester workload
Lectures 39 hours
Laboratory hours 26 hours
Personal study 57 hours
   
Final exams 3 hours
Course total 125 hours (5 ECTS)

Student Performance Evaluation

Exam (50%), Assignment (50%)

Language of Instruction and Examinations

Greek, English (for Erasmus students)

Delivery Mode

Face-to-face.