|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectweka.filters.Filter
weka.filters.unsupervised.attribute.StringToWordVector
Converts String attributes into a set of attributes representing word occurrence information from the text contained in the strings. The set of words (attributes) is determined by the first batch filtered (typically training data).
Field Summary | |
protected Range |
m_SelectedRange
Range of columns to convert to word vectors |
Fields inherited from class weka.filters.Filter |
m_NewBatch |
Constructor Summary | |
StringToWordVector()
Default constructor. |
|
StringToWordVector(int wordsToKeep)
Constructor that allows specification of the target number of words in the output. |
Method Summary | |
boolean |
batchFinished()
Signify that this batch of input to the filter is finished. |
java.lang.String |
getDelimiters()
Get the value of delimiters. |
java.lang.String[] |
getOptions()
Gets the current settings of the filter. |
boolean |
getOutputWordCounts()
Gets whether output instances contain 0 or 1 indicating word presence, or word counts. |
Range |
getSelectedRange()
Get the value of m_SelectedRange. |
int |
getWordsToKeep()
Gets the number of words (per class if there is a class attribute assigned) to attempt to keep. |
boolean |
input(Instance instance)
Input an instance for filtering. |
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options |
static void |
main(java.lang.String[] argv)
Main method for testing this class. |
void |
setDelimiters(java.lang.String newDelimiters)
Set the value of delimiters. |
boolean |
setInputFormat(Instances instanceInfo)
Sets the format of the input instances. |
void |
setOptions(java.lang.String[] options)
Parses a given list of options controlling the behaviour of this object. |
void |
setOutputWordCounts(boolean outputWordCounts)
Sets whether output instances contain 0 or 1 indicating word presence, or word counts. |
void |
setSelectedRange(java.lang.String newSelectedRange)
Set the value of m_SelectedRange. |
void |
setWordsToKeep(int newWordsToKeep)
Sets the number of words (per class if there is a class attribute assigned) to attempt to keep. |
Methods inherited from class weka.filters.Filter |
batchFilterFile, bufferInput, copyStringValues, copyStringValues, filterFile, flushInput, getInputFormat, getInputStringIndex, getOutputFormat, getOutputStringIndex, getStringIndices, inputFormat, isOutputFormatDefined, numPendingOutput, output, outputFormat, outputFormatPeek, outputPeek, push, resetQueue, setOutputFormat, useFilter |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
protected Range m_SelectedRange
Constructor Detail |
public StringToWordVector()
public StringToWordVector(int wordsToKeep)
wordsToKeep
- the number of words in the output vector (per class
if assigned).Method Detail |
public java.util.Enumeration listOptions()
listOptions
in interface OptionHandler
public void setOptions(java.lang.String[] options) throws java.lang.Exception
-C
Output word counts rather than boolean word presence.
-D delimiter_charcters
Specify set of delimiter characters
(default: " \n\t.,:'\\\"()?!\"
-R index1,index2-index4,...
Specify list of string attributes to convert to words.
(default: all string attributes)
-W number_of_words_to_keep
Specify number of word fields to create.
Other, less useful words will be discarded.
(default: 1000)
setOptions
in interface OptionHandler
options
- the list of options as an array of strings
java.lang.Exception
- if an option is not supportedpublic java.lang.String[] getOptions()
getOptions
in interface OptionHandler
public boolean setInputFormat(Instances instanceInfo) throws java.lang.Exception
setInputFormat
in class Filter
instanceInfo
- an Instances object containing the input
instance structure (any instances contained in the object are
ignored - only the structure is required).
java.lang.Exception
- if the input format can't be set
successfullypublic boolean input(Instance instance)
input
in class Filter
instance
- the input instance.
java.lang.IllegalStateException
- if no input structure has been defined.public boolean batchFinished()
batchFinished
in class Filter
java.lang.IllegalStateException
- if no input structure has been defined.public boolean getOutputWordCounts()
public void setOutputWordCounts(boolean outputWordCounts)
outputWordCounts
- true if word counts should be output.public java.lang.String getDelimiters()
public void setDelimiters(java.lang.String newDelimiters)
public Range getSelectedRange()
public void setSelectedRange(java.lang.String newSelectedRange)
newSelectedRange
- Value to assign to m_SelectedRange.public int getWordsToKeep()
public void setWordsToKeep(int newWordsToKeep)
newWordsToKeep
- the target number of words in the output
vector (per class if assigned).public static void main(java.lang.String[] argv)
argv
- should contain arguments to the filter:
use -h for help
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |