Weka contains a simple rule-learning system called Prism
(weka.classifiers.rules.Prism
). The system implements a top-down
(general to specific) sequential-covering algorithm that employs a simple
accuracy-based metric to pick an appropriate rule antecedent during rule
construction. This algorithm is described in section 4.4 of the Witten and
Frank textbook.
In this assignment, you should test rule learning on the soybean (full set),
audiology, and splice (splice-new
) data sets in
/u/mooney/cs391L-code/weka/data/
. Since Prism only supports
nominal features and not continuous ones, it is unable to run on many of the
other data sets. The slightly edited version of Prism in the class directory
collects several summary statistics on the size of the rule base by adding
AdditionalMeasureProducer's for the number of rules learned (measureNumRules),
the total number of rule antecendents learned (measureNumAntecedents), and the
average number of antecedents per rule (measureAveAntecedents).
The Foil heuristic for picking the best antecedent to add to a rule, as discussed in class, is somewhat more sophisticated than the simple heuristic in Prism. Make a simple propositional Foil-based learner called Pfoil. The easiest way to do this is to copy and edit the Prism code and simply change the heuristic it uses to pick the best rule-antecedent at each point. Computing the Foil heuristic may require storing additional information during rule construction.
Use the Experimenter GUI to run 10 fold cross-validation learning curves
comparing Prism, Pfoil, and the more sophisticated rule learner
weka.classifiers.rules.JRip
on the soybean, audiology, and splice
data. JRip is a version of W. Cohen's Ripper that is a Pfoil-based algorithm
with a form of reduced-error pruning (REP) using a tuning set to prune learned
rules to prevent over-fitting. If you give Ripper too little training data, it
can cause and error "Not enough data for REP", so you cannot run it on a very
small training set.
Remember to plot points more densely early in the learning curve where changes are more pronounced. Note: these experiments are fairly computationally intensive and may take a little while to run (particularly the splice data). In the Analyzer in the Experimenter you can analyze training time, train accuracy, testing time, number of rules, number of antecedents, and average number of antecedents per rule as well as test accuracy.
In your report, discuss the following issues: