weka.extraction
Class ClusteringExtractor

java.lang.Object
  extended byweka.extraction.Extractor
      extended byweka.extraction.ClusteringExtractor
All Implemented Interfaces:
java.lang.Cloneable, OptionHandler

public class ClusteringExtractor
extends Extractor
implements OptionHandler

An abstract extractor class. Takes a set of objects and trains on it; then can be used for extraction on a testing set.


Field Summary
protected  Clusterer m_clusterer
          The clusterer
protected  Extractor m_extractor
          The baseline extractor that is used
protected  int m_mode
           
protected  boolean m_verbose
          Verbose?
static int MODE_DOCUMENT_CLUSTERS
          Two fundamental modes.
static int MODE_MIXED
           
static int MODE_SEGMENT_CLUSTERS
           
static Tag[] TAGS_CLUSTERING_MODE
           
 
Fields inherited from class weka.extraction.Extractor
m_statistics
 
Constructor Summary
ClusteringExtractor()
          A default constructor
 
Method Summary
static java.lang.String concatStringArray(java.lang.String[] strings)
          A little helper to create a single String from an array of Strings
 Clusterer getClusterer()
          Get the clusterer
 Extractor getExtractor()
          Get the extractor
 SelectedTag getMode()
          return the clustering mode
 java.lang.String[] getOptions()
          Gets the current settings of Greedy Agglomerative Clustering
 boolean getVerbose()
          get the verbosity level of the clusterer
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options
 void setClusterer(Clusterer clusterer)
          Set the clusterer
 void setExtractor(Extractor extractor)
          Set the extractor
 void setMode(SelectedTag mode)
          Set the clustering mode
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setVerbose(boolean verbose)
          set the verbosity level of the clusterer
 void testExtractor(Instances testData, java.util.HashMap docFillerMap)
          Perform extraction on a set of data.
 void trainExtractor(Instances labeledData, Instances unlabeledData)
          Given training data, train the extractor
 
Methods inherited from class weka.extraction.Extractor
forName, getStatistics
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_extractor

protected Extractor m_extractor
The baseline extractor that is used


m_clusterer

protected Clusterer m_clusterer
The clusterer


MODE_DOCUMENT_CLUSTERS

public static final int MODE_DOCUMENT_CLUSTERS
Two fundamental modes. We can either cluster documents, and train separate extractors depending what the document is like Or, we can cluster text segments and train separate extractors for different segments Or, we could mix, but we're not touching this for now...

See Also:
Constant Field Values

MODE_SEGMENT_CLUSTERS

public static final int MODE_SEGMENT_CLUSTERS
See Also:
Constant Field Values

MODE_MIXED

public static final int MODE_MIXED
See Also:
Constant Field Values

TAGS_CLUSTERING_MODE

public static final Tag[] TAGS_CLUSTERING_MODE

m_mode

protected int m_mode

m_verbose

protected boolean m_verbose
Verbose?

Constructor Detail

ClusteringExtractor

public ClusteringExtractor()
A default constructor

Method Detail

trainExtractor

public void trainExtractor(Instances labeledData,
                           Instances unlabeledData)
                    throws java.lang.Exception
Given training data, train the extractor

Specified by:
trainExtractor in class Extractor
Parameters:
labeledData - a set of training data
unlabeledData - we don't plan to use transduction here for now
Throws:
java.lang.Exception

testExtractor

public void testExtractor(Instances testData,
                          java.util.HashMap docFillerMap)
                   throws java.lang.Exception
Perform extraction on a set of data.

Specified by:
testExtractor in class Extractor
Parameters:
testData - a set of instances on which to perform extraction
docFillerMap - a map where the uniqueID of an instance (document) is mapped to a HashMap, which maps fillers to a list of Integer positions
Throws:
java.lang.Exception

setMode

public void setMode(SelectedTag mode)
Set the clustering mode

Parameters:
mode - one of MODE_DOCUMENT_CLUSTERS or MODE_SEGMENT_CLUSTERS

getMode

public SelectedTag getMode()
return the clustering mode

Returns:
one of MODE_DOCUMENT_CLUSTERS or MODE_SEGMENT_CLUSTERS

setClusterer

public void setClusterer(Clusterer clusterer)
Set the clusterer

Parameters:
clusterer - the clusterer to be used

getClusterer

public Clusterer getClusterer()
Get the clusterer

Returns:
the clusterer that is used

setExtractor

public void setExtractor(Extractor extractor)
Set the extractor

Parameters:
extractor - the extractor to be used

getExtractor

public Extractor getExtractor()
Get the extractor

Returns:
the extractor that is used

setVerbose

public void setVerbose(boolean verbose)
set the verbosity level of the clusterer

Parameters:
verbose - messages on(true) or off (false)

getVerbose

public boolean getVerbose()
get the verbosity level of the clusterer

Returns:
messages on(true) or off (false)

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-D document-clustering mode or -S segment-clustering mode -E extractor-name extractor-options
extractor and its options -C clusterer-name clusterer-options
clusterer and its options

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

concatStringArray

public static java.lang.String concatStringArray(java.lang.String[] strings)
A little helper to create a single String from an array of Strings

Parameters:
strings - an array of strings

getOptions

public java.lang.String[] getOptions()
Gets the current settings of Greedy Agglomerative Clustering

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions()