|
DATE15/EDA132 - Spring 2009
(Applied) Artificial Intelligence
Assignment 4a: Decision Trees, ID3
Briefly:
Induce a decision tree for a couple of example problems
using ID3 algorithm. For a better grade add pruning of the
resulting trees or extend ID3 in a suggested way. Finding
an appropriate pruning algorithm is the challenging part
of the task.
WARNING
Programming assignments, with possible exceptions of the
second and third one, are expected to be solved in one,
maximum two days of *hard* work, by an average student. Do
more if you feel it is challenging, interesting or simply
fun, but do not overdo it and do not read more from a
description than is there. BUT: do not read less either:-)
Longer version, with all the legalese:
- (OBLIGATORY) Implement the decision tree learning
algorithm according to Figure 18.5 in Russell-Norvig,
Section 18.3. You are allowed to do your own
modifications to improve the algorithm. See more details
below.
- (OPTIONAL) Modify your solution by adding
improvements to the ID3 algorithm, according to the
description below, or by post-processing the generated
trees. I am mostly interested in pruning, either
performed on the tree itself, or on the set of rules
generated from the tree (like in C4.5 algorithm). Some
other possible improvements are listed below in a
separate section entitled just "Improvements".
Simple ID3 decision tree induction (the obligatory part)
Your task is to implement a program that computes a
decision tree according to the algorithm of Fig. 18.5,
including an appropriate implementation of the
CHOOSE-ATTRIBUTE function according to the theory in
Russell-Norvig, Section 18.3. Input to the program is a
number of examples defining the features and possible
values for each feature including the classifying
feature and a collection of examples. The output
should be a print-out of the resulting decision tree in
approximately the following general form:
(attribute)
value1:
(attribute)
value1: (classifier) is No
value2: (classifier) is Yes
value2: (classifier) is No
value3:
(attribute)
value1:
(attribute)
value1: (classifier) is No
value2: (classifier) is Yes
value2: (classifier) is No
Example sets are available in the files
'restaurant.train' and 'weather.train' in the directory
/usr/local/cs/tai/decision-trees/ . The example sets are
given in LISP-format. You are free to transform the
example sets to any other form. You may include the
example sets directly into your program. i.e. you are
not forced to implement any file management. If you find
these example sets too simple, use any of the example
sets in the directory
/usr/local/cs/tai/neural-nets/databases/ or an example
set of your own choice.
In the first version of the algorithm you are adviced to use a simple
CHOOSE-ATTRIBUTE function, e.g. select the first of the remaining
attributes or select it randomly. This is to check that the decision
tree is built correctly. When you are certain of this, implement the
CHOOSE-ATTRIBUTE function to choose the best attribute according to
the information gain principle sketched in Section 18.3.
Improvements (the optional part)
Improve your ID3 implementation according to one or more of the
following suggestions:
-
The information theory outlined in Section 18.4 is based on positive
and negative examples only. Implement a generalized version of
CHOOSE-ATTRIBUTE that is able to handle any number of values (c1,
c2,, ...cn) for the classification attribute.
(attribute)
value1:
(attribute)
value1: (classifier) is c1
value2: (classifier) is c2
value3: (classifier) is c3
value2: (classifier) is c3
value3:
(attribute)
value1:
(attribute)
value1: (classifier) is c1
value2: (classifier) is c2
value2: (classifier) is c2
value4: (classifier) is c4
Example sets are available in the files 'todo.train' and 'diagnosis.train'.
- Generate rules that cover all the branches of your decision tree.
- Extend your implementation to allow attributes to have integer
values. Input data to test this version is available in the file
'house.train', 'error.train' and in 'iris.train'. The valueN for the
integer attributes should in this case be a test condition such as <
or <= in the decision tree. It is acceptable to convert the real
numbers in an example set to the corresponding integer values,
i.e. 1.234 may be converted to 1234.
- Apply pruning to your tree;
If you find the example sets too simple, use any of the example sets
in the directory /usr/local/cs/tai/neural-nets/databases/ or example
sets of your own choice.
Remarks
Deadline: The report must be handed in for
evaluation before 23.59 on Thursday, April 30th, 2009. The
report should be put in the box labeled TAI (EDA132 or
DATE15, respectively) in the stairwells outside the
secretary offices (LTH or NF).
Problems: Send e-mail to aikurs AT cs.lth.se .
The report
The assignment must be documented in a report, which should contain
the following:
- A presentation of the assignment.
- A presentation of the selected application and the
improvement(s) you have choosen.
- A presentation of your implementation and how to
run the executable.
- Print outs of the example set(s) and the resulting
decision tree(s).
- In case you used code from the repository (or
anywhere else, for that matter), state which part is
yours and which is not. You do not need to provide all
or any code printout in the report - the code is
available in your solution directory anyway - but only
comments on its interesting parts, including the authorship.
- Comment on the results you have achieved.
- The report should be nicely typeset and formatted.
- The front page should state the name of the author,
that the report concerns Assignment 4a, and any relevant
information you think could be useful.
- The report must be filed in on paper. Neither e-mail
nor web pages will be accepted. You may receive
feedback, but do not count on the report to be returned,
so do not hand in originals.
Programming language and environment You may use
any suitable implementation language and you may use your own computer
to develop your program. However, the documented results of your
investigations must be possible to be validated by executing a version
available on the *.student.lth.se Linux computers. This version should
of course be the same as the one handed in as program source.
Your final program must be available and runnable at the
department LINUX computers at the *.student.lth.se
net (e.g. login-4.student.lth.se). It is
NOT possible to hand in a medium containing
the program. Remember to make your programs and all the
directories in their path read and execute accessible to
'others' (chmod 705 ). Remember also to quote
where does your solution reside and how should it be run
(kind of "User's Guide").
The resulting programs should remain in your directory
until you have
been notified of the result, e.g. on the notice board
and/or web or by e-mail. You may expect that your report
and implementation will be examined within two weeks. If
your report or implementation is unsatisfactory you will
be given one chance to make the corrections and then to
hand it in within a week after you have been notified (on
the notice board and/or web or by e-mail).
On Assignment 4
Please note that there are three assignments bearing
number 4: one about decision trees (this one), one about neural nets and
one about genetic algorithms. You are expected to file in only one of
them in order to get Assignment 4 passed. Filing two or
three will not count towards the total number of assignments
filed in, although you may be interested in getting
feedback on them anyway.
|