MSc in Web and Smart Systems / Ступінь Магістра в Галузі Веб і Мобільних Систем

Course Details

Course Information Package

Course Unit TitleDATA MINING
Course Unit CodeACSC522
Course Unit Details
Number of ECTS credits allocated7
Learning Outcomes of the course unitBy the end of the course, the students should be able to:
  1. Define and explain the major principles, terminology, problem types and research issues of data mining.
  2. Describe and discuss the main data mining techniques and their theoretical basis and evaluate their strengths and weaknesses.
  3. Explain and propose ways of dealing with the issues involved in the application of data mining techniques to practical problems.
  4. Conduct a detailed data mining investigation of a practical problem and critically analyse and evaluate the results.
  5. Discuss and demonstrate the application of data mining techniques in information retrieval and web search.
  6. Define, explain and demonstrate the main concepts, issues and approaches for designing a recommender system.
Mode of DeliveryFace-to-face
PrerequisitesNONECo-requisitesNONE
Recommended optional program componentsNONE
Course Contents1.  Introduction to Data Mining
-  What is data mining. Who uses data mining and why. Situations where data mining is useful.
-  Simple examples of problems and data that will be used throughout the course to demonstrate and explain the data mining techniques.
-  Real life application examples of data mining.
2.  Data Mining Problem Types and Data
-  Classification, regression, association learning and clustering.
-  Examples, attributes and attribute types.
-  Preparing the data for mining.
3.  Data Mining Techniques
-  Inferring rudimentary rules – 1R
-  Statistical modeling – Naive Bayes
-  Decision Trees. Choosing the best splitting attribute. Tree pruning. Decision tree pros and cons.
-  Association Rule Mining. Evaluation of association rules. Problems and limitations of association rules.
-  Linear models: Linear regression, Logistic regression
-  Artificial Neural Networks. Biological motivation. Perceptrons. Multilayer Neural Networks. Neural Network training.
-  Instance-based learning. Nearest Neighbour approaches.
-  Clustering. Why cluster the data. The K-means clustering method. Evaluating clusters.
4.  Information Retrieval and Web Search
-  Information retrieval models. The “bag of words” representation. Text pre-processing: stop word removal, stemming, frequency counts. Evaluation measures: precision and recall.
-  Web search. Inverted index. Ranking documents/pages.
5.  Recommender Systems
-  Utility matrix. The long tail. Recommender systems applications.
-  Content-based systems. Item profiles. User profiles. Classification of items.
-  Collaborative filtering systems. Measuring row and column similarity. Clustering users and items.
Recommended and/or required reading:
Textbooks
  • Ian Witten and Eibe Frank, “Data Mining: Practical Machine Learning Tools and Techniques”, Second Edition, Morgan Kaufmann, 2005.
References
  • Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, “Introduction to Information Retrieval”, Cambridge University Press, 2008.
  • Bing Liu, “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data”, Second Edition, Springer, 2011.
  • Anand Rajaraman and Jeffrey D. Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2011.
  • An extensive reading list of relevant academic papers.
Planned learning activities and teaching methodsThe taught part of course will be delivered to students by means of lectures, conducted with the help of computer presentations. Lecture notes and presentations will be available through the web for students to use in combination with the textbook. Furthermore theoretical principles will be explained by means of examples.
Lectures will be supplemented with supervised and unsupervised computer laboratory hours. Laboratories will include demonstrations of taught concepts and experimentation with related technologies on the Weka data mining workbench. Additionally, during laboratory sessions, students will apply their gained knowledge and identify the principles taught in the lecture sessions by experimenting with data mining techniques on benchmark data and evaluating the results.
Assessment methods and criteria
Assignments30%
Project work30%
Final Exam40%
Language of instructionEnglish
Work placement(s)NO

 Друк  E-mail