|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectweka.classifiers.Classifier
weka.classifiers.DistributionClassifier
weka.classifiers.bayes.SemiSupEM
Semi supervised learner that uses EM initialized with labeled data and then runs EM iterations on the unlabeled data to improve the model. See: Kamal Nigam, Andrew McCallum, Sebastian Thrun and Tom Mitchell. Text Classification from Labeled and Unlabeled Documents using EM. Machine Learning, 39(2/3). pp. 103-134. 2000. Assumes use of a base classifier that is a SoftClassifer that accepts training data with a soft class distribution rather than a hard assignment, i.e. SoftClassifiedInstances. Sample soft classifiers are NaiveBayesSimpleSoft and NaiveBayesSimpleSparseSoft
Field Summary | |
protected SoftClassifiedInstances |
m_AllInstances
Complete set of labeled and unlabeled instances for EM |
protected SoftClassifier |
m_Classifier
Base classifier that supports soft classified instances |
protected Instances |
m_LabeledInstances
Hard Labeled data |
protected double |
m_Lambda
Weight of unlabeled examples during EM training versus labeled examples (see Nigam et al.) |
protected int |
m_max_iterations
maximum iterations to perform |
protected double[] |
m_MaxArray
The maximum values for numeric attributes. |
protected double[] |
m_MinArray
The minimum values for numeric attributes. |
protected static double |
m_minLogLikelihoodIncr
|
protected java.util.Random |
m_Random
random numbers and seed |
protected int |
m_rseed
|
protected boolean |
m_seedUnseenClasses
Create soft labeled Seed for unseen classes |
protected Instances |
m_UnlabeledData
Original set of unlabeled Instances |
protected SoftClassifiedInstances |
m_UnlabeledInstances
Soft labeled version of unlabeled data |
protected boolean |
m_verbose
Verbose? |
Constructor Summary | |
SemiSupEM()
Simple constructor, must set options using command line or GUI |
Method Summary | |
void |
buildClassifier(Instances data)
Generates the classifier. |
protected java.lang.String |
classDistributionString(SoftClassifiedInstance inst)
|
java.lang.String |
classifierTipText()
|
protected double |
distance(Instance first,
Instance second)
Calculates the distance between two instances |
double[] |
distributionForInstance(Instance instance)
Calculates the class membership probabilities for the given test instance. |
protected double |
eStep()
|
protected Instance |
farthestInstance(Instances candidateInsts,
Instances insts)
Return the instance in candidateInsts that is farthest from any instance in insts |
SoftClassifier |
getClassifier()
Get the classifier used as the classifier |
boolean |
getDebug()
Get debug mode |
double |
getLambda()
|
int |
getMaxIterations()
Get the maximum number of iterations |
java.lang.String[] |
getOptions()
Gets the current settings of EM. |
int |
getSeed()
Get the random number seed |
boolean |
getSeedUnseenClasses()
|
java.lang.String |
globalInfo()
Returns a string describing this clusterer |
protected void |
initModel()
Intialize model using appropriate set of data |
protected void |
iterate()
Run EM iterations until likelihood stops increasing significantly or max iterations exhausted |
java.lang.String |
lambdaTipText()
|
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options.. |
double |
logSum(double[] logProbs)
Sums log of probabilities using special method for summing in log space |
static void |
main(java.lang.String[] argv)
Main method for testing this class. |
java.lang.String |
maxIterationsTipText()
Returns the tip text for this property |
protected double |
minimumDistance(Instance inst,
Instances insts)
Return the distance from inst to the closest instance in insts |
protected void |
mStep()
|
protected double |
norm(double x,
int i)
Normalizes a given value of a numeric attribute. |
protected void |
resetOptions()
Reset to default options |
java.lang.String |
seedTipText()
Returns the tip text for this property |
java.lang.String |
seedUnseenClassesTipText()
|
void |
setClassifier(SoftClassifier newClassifier)
Set the classifier for boosting. |
void |
setDebug(boolean v)
Set debug mode - verbose output |
void |
setLambda(double v)
|
void |
setMaxIterations(int i)
Set the maximum number of iterations to perform |
protected void |
setMinMax(Instances insts)
Compute and store min max values for each numeric feature |
void |
setOptions(java.lang.String[] options)
Parses a given list of options. |
void |
setSeed(int s)
Set the random number seed |
void |
setSeedUnseenClasses(boolean v)
|
void |
setUnlabeled(Instances unlabeled)
Provide unlabeled data to the classifier. |
protected void |
softLabelClasses(SoftClassifiedInstance inst,
java.util.List classes)
Soft label inst as being equally likely to be in an of the given classes |
protected java.util.ArrayList |
unseenClasses(Instances insts)
Return a list of class values for which there are no instances in insts |
protected void |
updateMinMax(Instance instance)
Updates the minimum and maximum values for all the attributes based on a new instance. |
protected void |
weightInstances(Instances insts,
double weight)
Weighted all given instances with given weight |
Methods inherited from class weka.classifiers.DistributionClassifier |
calculateEntropy, calculateLabeledInstanceMargin, calculateMargin, classifyInstance |
Methods inherited from class weka.classifiers.Classifier |
forName, makeCopies |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
protected Instances m_UnlabeledData
protected SoftClassifiedInstances m_UnlabeledInstances
protected Instances m_LabeledInstances
protected SoftClassifiedInstances m_AllInstances
protected SoftClassifier m_Classifier
protected double m_Lambda
protected java.util.Random m_Random
protected int m_rseed
protected int m_max_iterations
protected boolean m_seedUnseenClasses
protected boolean m_verbose
protected static double m_minLogLikelihoodIncr
protected double[] m_MinArray
protected double[] m_MaxArray
Constructor Detail |
public SemiSupEM()
Method Detail |
public java.lang.String globalInfo()
public java.util.Enumeration listOptions()
Valid options are:
-V
Verbose.
-I
-S
-M
Terminate after this many iterations if EM has not converged.
Specify random number seed.
Set the minimum allowable standard deviation for normal density
calculation.
listOptions
in interface OptionHandler
public void setOptions(java.lang.String[] options) throws java.lang.Exception
setOptions
in interface OptionHandler
options
- the list of options as an array of strings
java.lang.Exception
- if an option is not supportedprotected void resetOptions()
public java.lang.String seedTipText()
public void setSeed(int s)
s
- the seedpublic int getSeed()
public java.lang.String maxIterationsTipText()
public void setMaxIterations(int i) throws java.lang.Exception
i
- the number of iterations
java.lang.Exception
- if i is less than 1public int getMaxIterations()
public void setDebug(boolean v)
v
- true for verbose outputpublic boolean getDebug()
public void setSeedUnseenClasses(boolean v)
public boolean getSeedUnseenClasses()
public java.lang.String seedUnseenClassesTipText()
public void setLambda(double v)
public double getLambda()
public java.lang.String lambdaTipText()
public void setClassifier(SoftClassifier newClassifier)
newClassifier
- the Classifier to use.public SoftClassifier getClassifier()
public java.lang.String classifierTipText()
public java.lang.String[] getOptions()
getOptions
in interface OptionHandler
public void setUnlabeled(Instances unlabeled)
setUnlabeled
in interface SemiSupClassifier
public void buildClassifier(Instances data) throws java.lang.Exception
buildClassifier
in class Classifier
data
- set of instances serving as training data
java.lang.Exception
- if the classifier has not been generated successfullyprotected void weightInstances(Instances insts, double weight)
protected void initModel() throws java.lang.Exception
java.lang.Exception
protected java.util.ArrayList unseenClasses(Instances insts)
protected Instance farthestInstance(Instances candidateInsts, Instances insts)
protected double minimumDistance(Instance inst, Instances insts)
protected void softLabelClasses(SoftClassifiedInstance inst, java.util.List classes) throws java.lang.Exception
java.lang.Exception
protected void iterate() throws java.lang.Exception
java.lang.Exception
protected double eStep() throws java.lang.Exception
java.lang.Exception
public double logSum(double[] logProbs)
protected java.lang.String classDistributionString(SoftClassifiedInstance inst)
protected void mStep() throws java.lang.Exception
java.lang.Exception
public double[] distributionForInstance(Instance instance) throws java.lang.Exception
distributionForInstance
in class DistributionClassifier
instance
- the instance to be classified
java.lang.Exception
- if distribution can't be computedprotected double distance(Instance first, Instance second)
first
- the first instancesecond
- the second instance
protected double norm(double x, int i)
x
- the value to be normalizedi
- the attribute's indexprotected void setMinMax(Instances insts)
protected void updateMinMax(Instance instance)
instance
- the new instancepublic static void main(java.lang.String[] argv)
argv
- the options
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |