Weka already contains a version of C4.5 decision-tree induction
(weka.classifiers.trees.j48.J48
) and standard boosting
(weka.classifiers.meta.AdaBoostM1
). You can run learning curve
experiments comparing different classifiers using the
weka.gui.experiment.Experimenter
GUI (see the Experimenter manual) and the
LearningCurveCrossValidationResultProducer
. A sample of data
files in Weka ARFF format are available in
/u/mooney/cs391L-code/weka/data/
and a wider variety of
classification data sets from the UCI repository are available in Weka ARFF
format in /u/ml/data/UCI/nominal/
.
The class weka.experiment.Grapher
can be used to
produce graphs from the ARFF output files generated by an InstancesResultListener
in the Experimenter
. The Grapher
class
generates as output a ".gplot" file that gnuplot can use
to generate a learning-curve graph (plot) in postscript (just execute
"gnuplot filename.gplot > filename.ps"). You can view the ".ps" file
with "gv filename.ps" (Ghostview). This graph shows
learning curves comparing J48 and bagging and boosting J48 for the
complete soybean data.
The implementation of AdaBoostM1 in Weka is a bit confusing because it does not directly follow the authors' original pseudocode as presented in class. The resulting Weka algorithm is mathematically equivalent; however, the conventions and notation are a bit different. Don't let this confuse you.
First, the sum of all weighted examples is not normalized to 1 but instead to the total number of training examples, since the default initial weight for each example in a set of Instances is 1. Since only the relative weights of examples matter rather than the absolute weights, normalizing the sum of the instance weights to any fixed constant is equivalent.
Second, in Weka, "beta" is actually log(1/β) and m_Betas[i] is the voting weight for the ith hypothesis (i.e. the classifier generated in the ith iteration). The Weka variable "reweight" is actually (1/β) and is used to up-weight the incorrectly classified examples rather than using β to down-weight the correctly classified examples (as in the original AdaBoost pseudocode). Since the weight sum is constantly renormalized to remain a constant, the effect is the same.
The purpose of this assignment is to implement and test a recently-proposed boosting algorithm, called TrAdaBoost as published in the most recent International Conference on Machine Learning (ICML-07) and is available in this PDF file. Read this paper carefully, focussing on understanding the description of the algorithm.
The goal of TrAdaBoost is to adapt AdaBoost to do transfer learning, in which learning in a source domain is used to improve learning in a different but related target domain. It assumes that some labeled training data is available in both the source and the target domain. Typically, one assumes that there is a fairly large amount of source data but relatively little target data, and the goal of transfer learning is to improve learning from the small amount of target data by exploting data for a related source task. TrAdaBoost assumes that the source and target data use exactly the same set of features and categories, but that the distribution of the data in the two domains is different. What is called Ts in the pseudocode in the paper is what I will refer to as "target data" and Td is what is what I will call "source data," following the normal use of the terms "target" and "source" in transfer learning.
TrAdaBoost assumes that due to the difference in distribution between source and target, that some of the source training examples will be useful in learning in the target domain but that some of the source examples will not be useful and could even be harmful. Therefore, it attempts to use the basic idea of iterative reweighting of examples used in AdaBoost to down-weight the "bad" source examples during training while allowing the "good" source examples to improve learning in the target domain. It combines the source and target data during learning, but treats them somewhat differently. It repeatedly trains the base classifier on the union of the weighted source and target examples. However, when measuring the accuracy of a learned base classifier, it only uses the target data to determine the error, ε. Also, when reweighting examples for the next iteration, it up-weights the incorrectly classified examples in the target data using the normal AdaBoost approach. However, it down-weights the incorrectly classified examples in the source data, since they seem to be "bad" examples that are misleading learning of the target domain. A different factor is used to down-weight the incorrect source examples that its motivated by a theoretical analysis included in the paper. A detailed understanding of this theoretical analysis is not required to complete the assignment. The psuedocode for TrAdaBoost in the paper is for binary concepts; however, it is easily generalized to the multi-category case by just in general up-weighting incorrect target examples and down-weighting incorrect source examples.
Your first task is to produce an implementation of TrAdaBoost by copying and
editing the Weka version of AdaBoostM1. The skeleton of such an implementation
is already provided for you in
/u/mooney/cs391L-code/weka/weka/classifiers/meta/TrAdaBoost.java
.
You must complete this skeleton by implementing the core methods
buildClassifierWithWeights
and setWeights
by properly
modifying those from AdaBoostM1. As input, TrAdaBoost takes a "-S" option that
takes the name of a ARFF data file that it should use as the source data during
training. The existing skeleton code provides the methods for TrAdaBoost to
work properly with the Weka OptionHandler
as well as the option
handling used in the Weka Experimenter GUI.
TrAdaBoost also takes an option flag "-N", called the NormalSource option.
When this flag is specified (and m_NormalSource
is true), source
data should be treated exactly the same as the target data (i.e. both source
and target data should be used for determining the error, ε, and
reweighted in the normal AdaBoost way). When "-N" is used, the result should
be the same as running regular AdaBoostM1 on the union of the source and target
data. This approach serves as a baseline when comparing to TrAdaBoost which
treats target and source data differently. By using TrAdaBoost with options
"-N" and "-I 1" (just one iteration of boosting) one can also obtain the
results of just running the base learner on the union of the source and target
data. This is also a useful baseline with which to compare. Since the "bad"
examples in the source data effectively act as noise when learning in the
target domain, AdaBoost may have a tendency to over-fit this "noise", so it is
useful to see if the base learner actually performs better or worse than
AdaBoostM1 when both are trained on the union of source and target data.
Unfortunately, we do not have good, natural source and target data for
evaluating transfer learning. Therefore, I have artificially manipulated the
soybean data for testing TrAdaBoost, the resulting files are in
/u/mooney/cs391L-code/weka/data/
. First, I split the soybean data
into two random halves, soybean-a.arff
and
soybean-b.arff
. We will use soybean-b as the target test domain
(i.e. as a data file for providing both training and test data for evaluation
using the Weka Experimenter). I have created two corrupted versions of
soybean-a to use as source data. In soybean-a-changed.arff
, I
changed the distribution by removing examples of three categories and
reasigning them to other related categories (all rhizoctonia-root-rot's
relabeled as charcoal-rot, all powdery-mildew's relabeled as downy-mildew, all
phytophthora-rot's relabeled as brown-stem-rot). In
soybean-a-swapped.arff
, I changed the distribution by swapping
examples between two pairs of categories (all phyllosticta-leaf-spot labels
changed to alternarialeaf-spot and vice versa, all phytophthora-rot labels
changed to brown-stem-rot and vice versa).
Note that this models exactly the sort of situation that TrAdaBoost assumes, that some of the source training examples are "bad" and some are "good". Therefore, experiments with this data only test the hypothesis that TrAdaBoost handles such a difference in target and source distributions well. It does not test the hypothesis that this situation is a common one that actually occurs in real applications. Therefore, our resulting evaluation is somewhat artificial, but so is the one in the original paper (although it also uses more natural text data to simulate a change in distribution).
You should use the Weka Experimenter to generate learning curves comparing your TrAdaBoost to AdaBoostM1 on the soybean-b data using both soybean-a-changed and soybean-a-swapped as two different source data sets. Always use J48 as the base learner and 30 boosting iterations. You should also compare to just running AdaBoostM1 on the combined source and target data (by running TrAdaBoost with a "-N" option) and to just running the base learner on the combined source and target data (by running TrAdaBoost with a "-N" and "-I 1" options).
Submit a hardcopy report that presents your results and explains and discusses them. What are the primary differences between the accuracy of the different methods? Are they statistically significant? How are these differences explained by the properties of the underlying algorithms? How do the differences in accuracy change as the number of training examples increases? Why do they change this way? What about relative training times and testing times?
Include color graphs of your final learning curves in the report, one for the "soybean-a-changed" source and one for the "soybean-a-swapped" source, each should have 4 curves (AdaBoostM1, TrAdaBoost, TrAdaBoost-N, TrAdaBoost-N-I1). Edit the .gplot source files to make the legends and graphs more readable.
Electronically submit your commented code and detailed ARFF result files following the instructions on handing in homeworks.