(Applied) Artificial Intelligence

Assignment 4a: Decision Trees, ID3

Briefly:

Induce a decision tree for a couple of example problems using ID3 algorithm. For a better grade add pruning of the resulting trees or extend ID3 in a suggested way. Finding an appropriate pruning algorithm is the challenging part of the task.

WARNING

Programming assignments, with possible exceptions of the second and third one, are expected to be solved in one, maximum two days of *hard* work, by an average student. Do more if you feel it is challenging, interesting or simply fun, but do not overdo it and do not read more from a description than is there. BUT: do not read less either:-)

Longer version, with all the legalese:

(OBLIGATORY) Implement the decision tree learning algorithm according to Figure 18.5 in Russell-Norvig, Section 18.3. You are allowed to do your own modifications to improve the algorithm. See more details below.
(OPTIONAL) Modify your solution by adding improvements to the ID3 algorithm, according to the description below, or by post-processing the generated trees. I am mostly interested in pruning, either performed on the tree itself, or on the set of rules generated from the tree (like in C4.5 algorithm). Some other possible improvements are listed below in a separate section entitled just "Improvements".

Simple ID3 decision tree induction (the obligatory part)

Your task is to implement a program that computes a decision tree according to the algorithm of Fig. 18.5, including an appropriate implementation of the CHOOSE-ATTRIBUTE function according to the theory in Russell-Norvig, Section 18.3. Input to the program is a number of examples defining the features and possible values for each feature including the classifying feature and a collection of examples. The output should be a print-out of the resulting decision tree in approximately the following general form:


    (attribute)

          value1:

                (attribute)

                      value1: (classifier) is No

                      value2: (classifier) is Yes

          value2: (classifier) is No

          value3:

                (attribute)

                      value1:

                            (attribute)

                                  value1: (classifier) is No

                                  value2: (classifier) is Yes

                      value2: (classifier) is No

Example sets are available in the files 'restaurant.train' and 'weather.train' in the directory /usr/local/cs/tai/decision-trees/ . The example sets are given in LISP-format. You are free to transform the example sets to any other form. You may include the example sets directly into your program. i.e. you are not forced to implement any file management. If you find these example sets too simple, use any of the example sets in the directory /usr/local/cs/tai/neural-nets/databases/ or an example set of your own choice.

In the first version of the algorithm you are adviced to use a simple CHOOSE-ATTRIBUTE function, e.g. select the first of the remaining attributes or select it randomly. This is to check that the decision tree is built correctly. When you are certain of this, implement the CHOOSE-ATTRIBUTE function to choose the best attribute according to the information gain principle sketched in Section 18.3.

Improvements (the optional part)

Improve your ID3 implementation according to one or more of the following suggestions:

The information theory outlined in Section 18.4 is based on positive and negative examples only. Implement a generalized version of CHOOSE-ATTRIBUTE that is able to handle any number of values (c1, c2,, ...cn) for the classification attribute.
(attribute) value1: (attribute) value1: (classifier) is c1 value2: (classifier) is c2 value3: (classifier) is c3 value2: (classifier) is c3 value3: (attribute) value1: (attribute) value1: (classifier) is c1 value2: (classifier) is c2 value2: (classifier) is c2 value4: (classifier) is c4
Example sets are available in the files 'todo.train' and 'diagnosis.train'.
Generate rules that cover all the branches of your decision tree.
Extend your implementation to allow attributes to have integer values. Input data to test this version is available in the file 'house.train', 'error.train' and in 'iris.train'. The valueN for the integer attributes should in this case be a test condition such as < or <= in the decision tree. It is acceptable to convert the real numbers in an example set to the corresponding integer values, i.e. 1.234 may be converted to 1234.
Apply pruning to your tree;

If you find the example sets too simple, use any of the example sets in the directory /usr/local/cs/tai/neural-nets/databases/ or example sets of your own choice.

Remarks

Deadline:

The report must be handed in for evaluation before 23.59 on Thursday, April 30th, 2009. The report should be put in the box labeled TAI (EDA132 or DATE15, respectively) in the stairwells outside the secretary offices (LTH or NF).

Problems:

Send e-mail to aikurs AT cs.lth.se .

The report

The assignment must be documented in a report, which should contain the following:

A presentation of the assignment.
A presentation of the selected application and the improvement(s) you have choosen.
A presentation of your implementation and how to run the executable.
Print outs of the example set(s) and the resulting decision tree(s).
In case you used code from the repository (or anywhere else, for that matter), state which part is yours and which is not. You do not need to provide all or any code printout in the report - the code is available in your solution directory anyway - but only comments on its interesting parts, including the authorship.
Comment on the results you have achieved.
The report should be nicely typeset and formatted.
The front page should state the name of the author, that the report concerns Assignment 4a, and any relevant information you think could be useful.
The report must be filed in on paper. Neither e-mail nor web pages will be accepted. You may receive feedback, but do not count on the report to be returned, so do not hand in originals.

Programming language and environment

You may use any suitable implementation language and you may use your own computer to develop your program. However, the documented results of your investigations must be possible to be validated by executing a version available on the *.student.lth.se Linux computers. This version should of course be the same as the one handed in as program source.

Your final program must be available and runnable at the department LINUX computers at the *.student.lth.se net (e.g. login-4.student.lth.se). It is NOT possible to hand in a medium containing the program. Remember to make your programs and all the directories in their path read and execute accessible to 'others' (chmod 705 ). Remember also to quote where does your solution reside and how should it be run (kind of "User's Guide").

The resulting programs should remain in your directory until you have been notified of the result, e.g. on the notice board and/or web or by e-mail. You may expect that your report and implementation will be examined within two weeks. If your report or implementation is unsatisfactory you will be given one chance to make the corrections and then to hand it in within a week after you have been notified (on the notice board and/or web or by e-mail).

On Assignment 4

Please note that there are three assignments bearing number 4: one about decision trees (this one), one about neural nets and one about genetic algorithms. You are expected to file in only one of them in order to get Assignment 4 passed. Filing two or three will not count towards the total number of assignments filed in, although you may be interested in getting feedback on them anyway.

Last update:
Wednesday, 12-Jan-2011 16:23:27 CET

[an error occurred while processing this directive]