Lab 1 - Evaluation Issues & Getting started
Evaluation in ML
Evaluation is a very important part of machine learning research. In the first
part of this lab session we will discuss some basic issues and
terminology. Read chapter 5 for more information on evaluating hypotheses.
Introduction of Topics for the Final Project
The course will be very short and you won't have much time working on your
assignments. It is important to start as early as possible to work on your
final project (it's worth 50% of the final grade). Look at the topics proposed
here and try to find one that interests you. You
are supposed to work in groups of at most 4 students. Try to make up your
group today and discuss which each other what and how you would like to work
on. Try to google for more information about the topics that interest you.
Getting started with WEKA
Finally, there are some smaller exercises for today to be solved and handed
in. You may work in groups of 2. Prepare a short lab report and hand it in
via e-mail to Cagri before or on the day of the deadline.
For this course, we shall be using the WEKA (http://www.cs.waikato.ac.nz/~ml/weka/)
software which implements several machine learning techniques in Java.
There are various classifiers implemented in the software which can
easily be invoked either on the command line, or GUI included in the
WEKA distribution. For example, to use the WEKA implementation of
C4.5, a decision tree learner using a data file called "train.arff" as
a training set you can type:
java weka.classifiers.trees.J48 -t train.arff
on command line (e.g. Linux or Windows command line shell, or
SimpleCLI utility in WEKA).
Alternatively, you can open the file with the graphical user
interface, and in 'Classify' tab, choose the appropriate classifier
and start the clasification task.
This will construct a decision tree from train.arff and then apply it
to train.arff. After that it will perform a 10-fold cross-validation
on train.arff. For more details on using WEKA, see the file README file and additional documentation
distributed with WEKA, or documentation seciton of the WEKA home page.
In this course we will use the command-line interface only but feel
free to experiment with the graphical interface as well if you like.
Assignment
Use the file weather.arff in data
directory of weka distribution (C:\Program Files\Weka-3-5 on
Windows machines). This file contains data for deciding when to play a certain
sport given weather conditions (it is the same data set as described
in Section 1.2 of the textbook). Run the J48 classifier using
"weather.arff" as the training set.
Here are the questions to be handed in:
- Report how many instances are correctly and incorrectly classified
on the training set.
- The classifier weka.classifiers.rules.ZeroR simply assigns the most
common classification in a training set to any new classifications and
can be used as a baseline for evaluating other machine learning
schemes. Invoke the ZeroR classifier using weather.arff. Report the
number of correctly classified and misclassified instances both for the
training set and cross-validation.
- What are baselines used for? Is ZeroR a reasonable baseline? Can you think
of other types of baselines?
- What is k-fold cross validation and why is it better than splitting the
data in one training set and one evaluation set?
- What is the difference between a development set and an evaluation set?
- Why is it necessary to test the statistical significance of performance
differences when comparing two models?
- What is wrong if you find the following sentence in a machine learning
paper: "Model 1 yields an accuracy of 97% compared to the baseline
of 94% accuracy, an improvement of 3% which is statistically significant."
- What is the difference between accuracy and precision? Explain with
respect to a multi-class classification task.
- Consider a binary classification task (target values = 0/1) with 50 test
data instances in which 5 are labeled as "1" and the others as "0"
Calculate accuracy, precision and recall for
a baseline classifier tagging everything as "0". Calculate the same measures
for a classifier that tags everything as "1". Give the 95% confidence
intervals for both accuracy estimations.