weka.attributeSelection
Class MatlabPCA

java.lang.Object
  extended byweka.attributeSelection.ASEvaluation
      extended byweka.attributeSelection.AttributeEvaluator
          extended byweka.attributeSelection.MatlabPCA
All Implemented Interfaces:
AttributeTransformer, OptionHandler, java.io.Serializable

public class MatlabPCA
extends AttributeEvaluator
implements AttributeTransformer, OptionHandler

Class for performing principal components analysis/transformation.

Valid options are:

-N
Don't normalize the input data.

-R
Retain enough pcs to account for this proportion of the variance.

-T
Transform through the PC space and back to the original space.

See Also:
Serialized Form

Field Summary
 java.lang.String m_eigenvalueFilename
          Name of the file where eigenvalues will be stored
 java.lang.String m_eigenvectorFilename
          Name of the file where eigenvectors will be stored
 java.lang.String m_eigenvectorFilenameBase
           
protected  java.lang.String m_PCAMFile
          Name of the Matlab program file that computes PCA
 
Constructor Summary
MatlabPCA()
           
 
Method Summary
 void buildEvaluator(Instances data)
          Initializes principal components and performs the analysis
 Instance convertInstance(Instance instance)
          Transform an instance in original (unormalized) format.
static void dumpAttributeNames(Instances data, java.lang.String filename)
          Dump attribute names into a text file
 double evaluateAttribute(int att)
          Evaluates the merit of a transformed attribute.
static java.lang.String getLogTimestamp()
          Get a timestamp string as a weak uniqueid
 boolean getNormalize()
          Gets whether or not input data is to be normalized
 java.lang.String[] getOptions()
          Gets the current settings of MatlabPCA
 boolean getTransformBackToOriginal()
          Gets whether the data is to be transformed back to the original space.
 double getVarianceCovered()
          Gets the proportion of total variance to account for when retaining principal components
 java.lang.String globalInfo()
          Returns a string describing this attribute transformer
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] argv)
          Main method for testing this class
 java.lang.String normalizeTipText()
          Returns the tip text for this property
 void prepareMatlab()
          Create matlab m-file for PCA
 double[][] readColumnVectors(java.lang.String name, int maxVectors)
          Read column vectors from a text file
 double[] readVector(java.lang.String name)
          Read a column vector from a text file
 void runMatlab(java.lang.String inFile, java.lang.String outFile)
          Run matlab in command line with a given argument
 void setNormalize(boolean n)
          Set whether input data will be normalized.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setTransformBackToOriginal(boolean b)
          Sets whether the data should be transformed back to the original space
 void setVarianceCovered(double vc)
          Sets the amount of variance to account for when retaining principal components
 java.lang.String toString()
          Returns a description of this attribute transformer
 java.lang.String transformBackToOriginalTipText()
          Returns the tip text for this property
 Instances transformedData()
          Gets the transformed training data.
 Instances transformedHeader()
          Returns just the header for the transformed data (ie.
protected  java.lang.String valsToString(double[] vals)
           
 java.lang.String varianceCoveredTipText()
          Returns the tip text for this property
 
Methods inherited from class weka.attributeSelection.ASEvaluation
forName, makeCopies, postProcess
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_PCAMFile

protected java.lang.String m_PCAMFile
Name of the Matlab program file that computes PCA


m_eigenvectorFilename

public java.lang.String m_eigenvectorFilename
Name of the file where eigenvectors will be stored


m_eigenvectorFilenameBase

public java.lang.String m_eigenvectorFilenameBase

m_eigenvalueFilename

public java.lang.String m_eigenvalueFilename
Name of the file where eigenvalues will be stored

Constructor Detail

MatlabPCA

public MatlabPCA()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this attribute transformer

Returns:
a description of the evaluator suitable for displaying in the explorer/experimenter gui

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-N
Don't normalize the input data.

-R
Retain enough pcs to account for this proportion of the variance.

-T
Transform through the PC space and back to the original space.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

normalizeTipText

public java.lang.String normalizeTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setNormalize

public void setNormalize(boolean n)
Set whether input data will be normalized.

Parameters:
n - true if input data is to be normalized

getNormalize

public boolean getNormalize()
Gets whether or not input data is to be normalized

Returns:
true if input data is to be normalized

varianceCoveredTipText

public java.lang.String varianceCoveredTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setVarianceCovered

public void setVarianceCovered(double vc)
Sets the amount of variance to account for when retaining principal components

Parameters:
vc - the proportion of total variance to account for

getVarianceCovered

public double getVarianceCovered()
Gets the proportion of total variance to account for when retaining principal components

Returns:
the proportion of variance to account for

transformBackToOriginalTipText

public java.lang.String transformBackToOriginalTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setTransformBackToOriginal

public void setTransformBackToOriginal(boolean b)
Sets whether the data should be transformed back to the original space

Parameters:
b - true if the data should be transformed back to the original space

getTransformBackToOriginal

public boolean getTransformBackToOriginal()
Gets whether the data is to be transformed back to the original space.

Returns:
true if the data is to be transformed back to the original space

getOptions

public java.lang.String[] getOptions()
Gets the current settings of MatlabPCA

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions()

buildEvaluator

public void buildEvaluator(Instances data)
                    throws java.lang.Exception
Initializes principal components and performs the analysis

Specified by:
buildEvaluator in class ASEvaluation
Parameters:
data - the instances to analyse/transform
Throws:
java.lang.Exception - if analysis fails

readColumnVectors

public double[][] readColumnVectors(java.lang.String name,
                                    int maxVectors)
                             throws java.lang.Exception
Read column vectors from a text file

Parameters:
name - file name
maxVectors - max number of vectors to read, -1 to read all\
Throws:
java.lang.Exception

readVector

public double[] readVector(java.lang.String name)
                    throws java.lang.Exception
Read a column vector from a text file

Parameters:
name - file name
Throws:
java.lang.Exception

dumpAttributeNames

public static void dumpAttributeNames(Instances data,
                                      java.lang.String filename)
Dump attribute names into a text file

Parameters:
data - instances for which to dump attributes
filename - name of the file where the attribute column goes

transformedHeader

public Instances transformedHeader()
                            throws java.lang.Exception
Returns just the header for the transformed data (ie. an empty set of instances. This is so that AttributeSelection can determine the structure of the transformed data without actually having to get all the transformed data through getTransformedData().

Specified by:
transformedHeader in interface AttributeTransformer
Returns:
the header of the transformed data.
Throws:
java.lang.Exception - if the header of the transformed data can't be determined.

transformedData

public Instances transformedData()
                          throws java.lang.Exception
Gets the transformed training data.

Specified by:
transformedData in interface AttributeTransformer
Returns:
the transformed training data
Throws:
java.lang.Exception - if transformed data can't be returned

evaluateAttribute

public double evaluateAttribute(int att)
                         throws java.lang.Exception
Evaluates the merit of a transformed attribute. This is defined to be 1 minus the cumulative variance explained. Merit can't be meaningfully evaluated if the data is to be transformed back to the original space.

Specified by:
evaluateAttribute in class AttributeEvaluator
Parameters:
att - the attribute to be evaluated
Returns:
the merit of a transformed attribute
Throws:
java.lang.Exception - if attribute can't be evaluated

prepareMatlab

public void prepareMatlab()
Create matlab m-file for PCA


runMatlab

public void runMatlab(java.lang.String inFile,
                      java.lang.String outFile)
Run matlab in command line with a given argument

Parameters:
inFile - file to be input to Matlab
outFile - file where results are stored

toString

public java.lang.String toString()
Returns a description of this attribute transformer

Returns:
a String describing this attribute transformer

convertInstance

public Instance convertInstance(Instance instance)
                         throws java.lang.Exception
Transform an instance in original (unormalized) format. Convert back to the original space if requested.

Specified by:
convertInstance in interface AttributeTransformer
Parameters:
instance - an instance in the original (unormalized) format
Returns:
a transformed instance
Throws:
java.lang.Exception - if instance cant be transformed

valsToString

protected java.lang.String valsToString(double[] vals)

getLogTimestamp

public static java.lang.String getLogTimestamp()
Get a timestamp string as a weak uniqueid


main

public static void main(java.lang.String[] argv)
Main method for testing this class

Parameters:
argv - should contain the command line arguments to the evaluator/transformer (see AttributeSelection)