|
|
Assignment
1: The First Introduction to the WEKA Data Mining Software
Decision Trees
1.1
Study the animals in the Excel document (zoo.xls). Without
using a data mining tool, draw a decision tree of three to five levels deep that
classifies animals into a mammal, bird, reptile, fish, amphibian, insect or
invertebrate.
1.2
Read about the ARFF-format here. Construct the
header for the animal file.
1.3
Download datasets.zip and unzip it. Open
zoo.arff by going to Weka and then choosing the explorer.
1.4
Find out in WEKA how many animals this dataset contains.
1.5
Go to the classifier tab and select the decision tree classifier j48. Click on
the line behind the choose button. This shows you the parameters you can set and
a button called 'More'. Which algorithm is implemented by j48?
1.6
Which percentage of instances is correctly classified by j48? Which families are
mistaken for each other?
1.7
Again go to the parameter settings by clicking on the box after the 'Choose'
button. Now change binarySplit to true and build a new decision tree. What is
the difference?
1.8
Experiment with some of the other classifiers and until you get a better
classification performance. Write down the classifier and its performance.
1.9
Compile the following source code in Java:
import
java.io.*;
import weka.classifiers.trees.J48;
import
weka.core.*;
public class MyDecisionTree {
MyDecisionTree(){
try{
FileReader reader = new FileReader("zoo.arff");
Instances instances = new Instances(reader);
// Make the last
attribute be the class
instances.setClassIndex(instances.numAttributes() -
1);
J48 tree = new
J48();
tree.buildClassifier(instances);
System.out.println("The third
animal is classified as: " +
tree.classifyInstance(instances.instance(2)));
reader.close();
}
catch(Exception ex){
ex.printStackTrace();
}
}
public static
void main(String args[]){
new MyDecisionTree(); }
}
To
compile: javac -classpath
weka.jar MyDecisionTree.java, note: first copy weka.jar and zoo.arff to
the same directory as MyDecisionTree.java.
To
execute: java -classpath
.;weka.jar MyDecisionTree
(in
some versions of java ; must be a : or it must be -classpath=.:weka.jar, please
e-mail me if it doesn't
work)
Use
the Weka API documentation. How can you
make a decision tree with a binary split?
Association
rules
Next
we will search for association rules.
2.1
The association algorithm requires nominal variables. In order to make all
variables nominal we need to distcretize the data.
This
pre-processing can be done with filtering, find the filter button on the
pre-processing tab and select the right unsupervised method to convert the
attributes to nominal attributes. After selecting a filter you can set its
properties by clicking on it. Press the apply button and watch how the
attributes change.
2.2
Now run the association rule algorithm. Which rules are always true? Write them
down.
2.3
Write down a couple of interesting rules and a couple of trivial
rules.
Pima indians, mushrooms
and politicians
3.1
The dataset.zip file contains different data sets ranging from predicting
diabetes in an indian population, distinguishing eatable mushrooms from
poisonous till separating republicans from democrats. Most datasets contain a
short description in the 'header'. Choose at least one data set, and answer the
following questions:
Other
datasets
On
the internet you can find many more data sets. Not all these data sets are in
the ARFF format. Choose one of the data sets from http://kdd.ics.uci.edu/. Convert this
dataset to the ARFF format and try different data mining techniques.