CS 391L Machine Learning: Homework 2

CS 391L Machine Learning
Homework 2: Transfer Learning with TrAdaBoost

Due: Oct. 16, 2007

Current Weka System

Weka already contains a version of C4.5 decision-tree induction (weka.classifiers.trees.j48.J48) and standard boosting (weka.classifiers.meta.AdaBoostM1). You can run learning curve experiments comparing different classifiers using the weka.gui.experiment.Experimenter GUI (see the Experimenter manual) and the LearningCurveCrossValidationResultProducer. A sample of data files in Weka ARFF format are available in /u/mooney/cs391L-code/weka/data/ and a wider variety of classification data sets from the UCI repository are available in Weka ARFF format in /u/ml/data/UCI/nominal/.

The class weka.experiment.Grapher can be used to produce graphs from the ARFF output files generated by an InstancesResultListener in the Experimenter. The Grapher class generates as output a ".gplot" file that gnuplot can use to generate a learning-curve graph (plot) in postscript (just execute "gnuplot filename.gplot > filename.ps"). You can view the ".ps" file with "gv filename.ps" (Ghostview). This graph shows learning curves comparing J48 and bagging and boosting J48 for the complete soybean data.

Note on the Weka Implementation of AdaBoostM1

The implementation of AdaBoostM1 in Weka is a bit confusing because it does not directly follow the authors' original pseudocode as presented in class. The resulting Weka algorithm is mathematically equivalent; however, the conventions and notation are a bit different. Don't let this confuse you.

First, the sum of all weighted examples is not normalized to 1 but instead to the total number of training examples, since the default initial weight for each example in a set of Instances is 1. Since only the relative weights of examples matter rather than the absolute weights, normalizing the sum of the instance weights to any fixed constant is equivalent.

Second, in Weka, "beta" is actually log(1/β) and m_Betas[i] is the voting weight for the ith hypothesis (i.e. the classifier generated in the ith iteration). The Weka variable "reweight" is actually (1/β) and is used to up-weight the incorrectly classified examples rather than using β to down-weight the correctly classified examples (as in the original AdaBoost pseudocode). Since the weight sum is constantly renormalized to remain a constant, the effect is the same.

TrAdaBoost

The purpose of this assignment is to implement and test a recently-proposed boosting algorithm, called TrAdaBoost as published in the most recent International Conference on Machine Learning (ICML-07) and is available in this PDF file. Read this paper carefully, focussing on understanding the description of the algorithm.

The goal of TrAdaBoost is to adapt AdaBoost to do transfer learning, in which learning in a source domain is used to improve learning in a different but related target domain. It assumes that some labeled training data is available in both the source and the target domain. Typically, one assumes that there is a fairly large amount of source data but relatively little target data, and the goal of transfer learning is to improve learning from the small amount of target data by exploting data for a related source task. TrAdaBoost assumes that the source and target data use exactly the same set of features and categories, but that the distribution of the data in the two domains is different. What is called T_s in the pseudocode in the paper is what I will refer to as "target data" and T_d is what is what I will call "source data," following the normal use of the terms "target" and "source" in transfer learning.

TrAdaBoost assumes that due to the difference in distribution between source and target, that some of the source training examples will be useful in learning in the target domain but that some of the source examples will not be useful and could even be harmful. Therefore, it attempts to use the basic idea of iterative reweighting of examples used in AdaBoost to down-weight the "bad" source examples during training while allowing the "good" source examples to improve learning in the target domain. It combines the source and target data during learning, but treats them somewhat differently. It repeatedly trains the base classifier on the union of the weighted source and target examples. However, when measuring the accuracy of a learned base classifier, it only uses the target data to determine the error, ε. Also, when reweighting examples for the next iteration, it up-weights the incorrectly classified examples in the target data using the normal AdaBoost approach. However, it down-weights the incorrectly classified examples in the source data, since they seem to be "bad" examples that are misleading learning of the target domain. A different factor is used to down-weight the incorrect source examples that its motivated by a theoretical analysis included in the paper. A detailed understanding of this theoretical analysis is not required to complete the assignment. The psuedocode for TrAdaBoost in the paper is for binary concepts; however, it is easily generalized to the multi-category case by just in general up-weighting incorrect target examples and down-weighting incorrect source examples.

Your first task is to produce an implementation of TrAdaBoost by copying and editing the Weka version of AdaBoostM1. The skeleton of such an implementation is already provided for you in /u/mooney/cs391L-code/weka/weka/classifiers/meta/TrAdaBoost.java. You must complete this skeleton by implementing the core methods buildClassifierWithWeights and setWeights by properly modifying those from AdaBoostM1. As input, TrAdaBoost takes a "-S" option that takes the name of a ARFF data file that it should use as the source data during training. The existing skeleton code provides the methods for TrAdaBoost to work properly with the Weka OptionHandler as well as the option handling used in the Weka Experimenter GUI.

TrAdaBoost also takes an option flag "-N", called the NormalSource option. When this flag is specified (and m_NormalSource is true), source data should be treated exactly the same as the target data (i.e. both source and target data should be used for determining the error, ε, and reweighted in the normal AdaBoost way). When "-N" is used, the result should be the same as running regular AdaBoostM1 on the union of the source and target data. This approach serves as a baseline when comparing to TrAdaBoost which treats target and source data differently. By using TrAdaBoost with options "-N" and "-I 1" (just one iteration of boosting) one can also obtain the results of just running the base learner on the union of the source and target data. This is also a useful baseline with which to compare. Since the "bad" examples in the source data effectively act as noise when learning in the target domain, AdaBoost may have a tendency to over-fit this "noise", so it is useful to see if the base learner actually performs better or worse than AdaBoostM1 when both are trained on the union of source and target data.

Experimental Evaluation

Unfortunately, we do not have good, natural source and target data for evaluating transfer learning. Therefore, I have artificially manipulated the soybean data for testing TrAdaBoost, the resulting files are in /u/mooney/cs391L-code/weka/data/. First, I split the soybean data into two random halves, soybean-a.arff and soybean-b.arff. We will use soybean-b as the target test domain (i.e. as a data file for providing both training and test data for evaluation using the Weka Experimenter). I have created two corrupted versions of soybean-a to use as source data. In soybean-a-changed.arff, I changed the distribution by removing examples of three categories and reasigning them to other related categories (all rhizoctonia-root-rot's relabeled as charcoal-rot, all powdery-mildew's relabeled as downy-mildew, all phytophthora-rot's relabeled as brown-stem-rot). In soybean-a-swapped.arff, I changed the distribution by swapping examples between two pairs of categories (all phyllosticta-leaf-spot labels changed to alternarialeaf-spot and vice versa, all phytophthora-rot labels changed to brown-stem-rot and vice versa).

Note that this models exactly the sort of situation that TrAdaBoost assumes, that some of the source training examples are "bad" and some are "good". Therefore, experiments with this data only test the hypothesis that TrAdaBoost handles such a difference in target and source distributions well. It does not test the hypothesis that this situation is a common one that actually occurs in real applications. Therefore, our resulting evaluation is somewhat artificial, but so is the one in the original paper (although it also uses more natural text data to simulate a change in distribution).

You should use the Weka Experimenter to generate learning curves comparing your TrAdaBoost to AdaBoostM1 on the soybean-b data using both soybean-a-changed and soybean-a-swapped as two different source data sets. Always use J48 as the base learner and 30 boosting iterations. You should also compare to just running AdaBoostM1 on the combined source and target data (by running TrAdaBoost with a "-N" option) and to just running the base learner on the combined source and target data (by running TrAdaBoost with a "-N" and "-I 1" options).

Submission

Submit a hardcopy report that presents your results and explains and discusses them. What are the primary differences between the accuracy of the different methods? Are they statistically significant? How are these differences explained by the properties of the underlying algorithms? How do the differences in accuracy change as the number of training examples increases? Why do they change this way? What about relative training times and testing times?

Include color graphs of your final learning curves in the report, one for the "soybean-a-changed" source and one for the "soybean-a-swapped" source, each should have 4 curves (AdaBoostM1, TrAdaBoost, TrAdaBoost-N, TrAdaBoost-N-I1). Edit the .gplot source files to make the legends and graphs more readable.

Electronically submit your commented code and detailed ARFF result files following the instructions on handing in homeworks.