Machine Learning


Lab 1 - Evaluation Issues & Getting started

Evaluation in ML

Evaluation is a very important part of machine learning research. In the first part of this lab session we will discuss some basic issues and terminology. Read chapter 5 for more information on evaluating hypotheses.

Introduction of Topics for the Final Project

The course will be very short and you won't have much time working on your assignments. It is important to start as early as possible to work on your final project (it's worth 50% of the final grade). Look at the topics proposed here and try to find one that interests you. You are supposed to work in groups of at most 4 students. Try to make up your group today and discuss which each other what and how you would like to work on. Try to google for more information about the topics that interest you.

Getting started with WEKA

Finally, there are some smaller exercises for today to be solved and handed in. You may work in groups of 2. Prepare a short lab report and hand it in via e-mail to Cagri before or on the day of the deadline.

For this course, we shall be using the WEKA (http://www.cs.waikato.ac.nz/~ml/weka/) software which implements several machine learning techniques in Java. There are various classifiers implemented in the software which can easily be invoked either on the command line, or GUI included in the WEKA distribution. For example, to use the WEKA implementation of C4.5, a decision tree learner using a data file called "train.arff" as a training set you can type:

  java weka.classifiers.trees.J48 -t train.arff
on command line (e.g. Linux or Windows command line shell, or SimpleCLI utility in WEKA). Alternatively, you can open the file with the graphical user interface, and in 'Classify' tab, choose the appropriate classifier and start the clasification task. This will construct a decision tree from train.arff and then apply it to train.arff. After that it will perform a 10-fold cross-validation on train.arff. For more details on using WEKA, see the file README file and additional documentation distributed with WEKA, or documentation seciton of the WEKA home page.

In this course we will use the command-line interface only but feel free to experiment with the graphical interface as well if you like.

Assignment

Use the file weather.arff in data directory of weka distribution (C:\Program Files\Weka-3-5 on Windows machines). This file contains data for deciding when to play a certain sport given weather conditions (it is the same data set as described in Section 1.2 of the textbook). Run the J48 classifier using "weather.arff" as the training set. Here are the questions to be handed in:

  1. Report how many instances are correctly and incorrectly classified on the training set.
  2. The classifier weka.classifiers.rules.ZeroR simply assigns the most common classification in a training set to any new classifications and can be used as a baseline for evaluating other machine learning schemes. Invoke the ZeroR classifier using weather.arff. Report the number of correctly classified and misclassified instances both for the training set and cross-validation.
  3. What are baselines used for? Is ZeroR a reasonable baseline? Can you think of other types of baselines?
  4. What is k-fold cross validation and why is it better than splitting the data in one training set and one evaluation set?
  5. What is the difference between a development set and an evaluation set?
  6. Why is it necessary to test the statistical significance of performance differences when comparing two models?
  7. What is wrong if you find the following sentence in a machine learning paper: "Model 1 yields an accuracy of 97% compared to the baseline of 94% accuracy, an improvement of 3% which is statistically significant."
  8. What is the difference between accuracy and precision? Explain with respect to a multi-class classification task.
  9. Consider a binary classification task (target values = 0/1) with 50 test data instances in which 5 are labeled as "1" and the others as "0" Calculate accuracy, precision and recall for a baseline classifier tagging everything as "0". Calculate the same measures for a classifier that tags everything as "1". Give the 95% confidence intervals for both accuracy estimations.