weka.clusterers
Class PCSoftKMeans

java.lang.Object
  extended byweka.clusterers.Clusterer
      extended byweka.clusterers.DistributionClusterer
          extended byweka.clusterers.PCSoftKMeans
All Implemented Interfaces:
ActiveLearningClusterer, java.lang.Cloneable, OptionHandler, SemiSupClusterer, java.io.Serializable

public class PCSoftKMeans
extends DistributionClusterer
implements OptionHandler, SemiSupClusterer, ActiveLearningClusterer

Pairwise constrained k means clustering class. Valid options are:

-N
Specify the number of clusters to generate.

-R
Specify random number seed

-A
The algorithm can be "Simple" (simple KMeans) or "Spherical" (spherical KMeans) -M
Specifies the name of the distance metric class that should be used .... etc.

See Also:
Clusterer, OptionHandler, Serialized Form

Field Summary
static int ALGORITHM_SIMPLE
          Define possible algorithms
static int ALGORITHM_SPHERICAL
           
protected  java.util.HashSet[] m_AdjacencyList
          adjacency list for neighborhoods
protected  int m_Algorithm
          algorithm, by default spherical
protected  double m_CannotLinkWeight
          weight to be given to each constraint
protected  double[] m_checksumCoeffs
           
protected  java.util.HashMap m_checksumHash
          A hash where the instance checksums are hashed
protected  int[] m_ClusterAssignments
          temporary variable holding cluster assignments while iterating
protected  Instances m_ClusterCentroids
          holds the cluster centroids
protected  double[][] m_ClusterDistribution
          temporary variable holding posterior cluster distribution of points while iterating
protected  java.util.ArrayList m_Clusters
          holds the instances in the clusters
protected  java.util.HashMap m_ConstraintsHash
          holds the ([instance pair] -> [type of constraint]) mapping.
protected  double m_DefaultPerturb
          holds the default perturbation value for randomPerturbInit
protected  boolean m_FastMode
          m_FastMode = true => fast computation of meanOrMode in centroid calculation, useful for high-D data sets m_FastMode = false => usual computation of meanOrMode in centroid calculation
protected  Instance m_GlobalCentroid
          holds the global centroids
protected  java.util.HashMap[] m_IndexClusters
          holds the instance indices in the clusters, mapped to their probabilities
protected  Instances m_Instances
          training instances
protected  boolean m_isSparseInstance
          indicates whether instances are sparse
protected  int m_Iterations
          keep track of the number of iterations completed before convergence
protected  double m_Kappa
          kappa value for vmf distribution
protected static int m_MaxConstraintsAllowed
          the maximum number of cannot-link constraints allowed
protected  double m_MaxKappaDist
           
protected  double m_MaxKappaSim
          max kappa value for vmf distribution
protected  double m_MergeThreshold
          holds the default merge threshold for matchMergeStep
protected  Metric m_metric
          distance Metric
protected  boolean m_metricBuilt
          has the metric has been constructed? a fix for multiple buildClusterer's
protected  double m_MustLinkWeight
          weight to be given to each constraint
protected  java.util.HashSet[] m_NeighborSets
          neighbor list: points in each neighborhood inferred from constraints
protected  int m_NumClusters
          number of clusters to generate, default is -1 to get it from labeled data
protected  int m_NumCurrentClusters
          Number of clusters in the process
protected  double m_Objective
          value of objective function
protected  double m_ObjFunConvergenceDifference
          min.
protected  boolean m_objFunDecreasing
          Is the objective function increasing or decreasing? Depends on type of metric used: for similarity-based metric - increasing, for distance-based - decreasing
protected  int m_RandomSeed
          holds the random Seed, useful for randomPerturbInit
protected  boolean m_Seedable
          Seedable or not (true by default)
protected  java.util.HashSet m_SeedHash
          holds the points involved in the constraints
protected  int m_StartingIndexOfTest
          test data -- required to make sure that test points are not selected during active learning
protected  Instance[] m_SumOfClusterInstances
          temporary variable holding cluster sums while iterating
protected  Instances m_TotalTrainWithLabels
          training instances with labels
protected  boolean m_verbose
          verbose?
static Tag[] TAGS_ALGORITHM
           
 
Constructor Summary
PCSoftKMeans()
           
PCSoftKMeans(Metric metric)
           
 
Method Summary
protected  void addMLAndCLTransitiveClosure(int[] indices)
          adding other inferred ML and CL links to m_ConstraintsHash, from m_NeighborSets
 int assignInstanceToCluster(Instance instance)
          Classifies the instance using the current clustering, without considering constraints
 double assignInstanceToClustersWithConstraints(int instIdx)
          Classifies the instance using the current clustering considering constraints, updates cluster assignment probs
 int[] bestInstancesForActiveLearning(int numActive)
          Dummy: not implemented for PCSoftKMeans
 InstancePair[] bestPairsForActiveLearning(int numActive)
          Dummy: not implemented for PCSoftKMeans
 void buildClusterer(java.util.ArrayList labeledPair, Instances unlabeledData, Instances labeledTrain, int startingIndexOfTest)
          Clusters unlabeledData and labeledData (with labels removed), using labeledData as seeds
 void buildClusterer(Instances data)
          Generates a clusterer.
 void buildClusterer(Instances labeledData, Instances unlabeledData, int classIndex, int numClusters)
          Clusters unlabeledData and labeledData (with labels removed), using labeledData as seeds -- NOT USED FOR PCSoftKMeans!!!
 void buildClusterer(Instances labeledData, Instances unlabeledData, int classIndex, int numClusters, int startingIndexOfTest)
          Clusters unlabeledData and labeledData (with labels removed), using labeledData as seeds
 void buildClusterer(Instances data, int num_clusters)
          Cluster given instances to form the specified number of clusters.
protected  void createCentroids()
          Creates the global cluster centroid
 double densityForInstance(Instance inst)
          Computes the density for a given instance.
protected  void DFS_VISIT(int u, int[] vertexColor)
          Recursive subroutine for DFS
protected  void DFS()
          Main Depth First Search routine
 double[] distributionForInstance(Instance instance)
          Checks if instance has to be normalized and returns the distribution of the instance using the current clustering
protected  double findAssignments()
          E-step of the KMeans clustering algorithm -- find new cluster assignments and new objective function
 SelectedTag getAlgorithm()
          Get the KMeans algorithm type.
 double getCannotLinkWeight()
          Return the cannot link constraint weight
 java.util.ArrayList getClusters()
          Computes the clusters from the cluster assignments, for external access
 double getDefaultPerturb()
          Get default perturbation value
 java.util.HashMap[] getIndexClusters()
          Computes the clusters from the cluster assignments, for external access
 Instances getInstances()
          Return training instances
 Metric getMetric()
          Get the distance metric
 double getMustLinkWeight()
          Return the must link constraint weight
 int getNumClusters()
          Return the number of clusters
 double getObjFunConvergenceDifference()
          Get the minimum value of the objective function difference required for convergence
 java.lang.String[] getOptions()
          Gets the current option settings for the OptionHandler.
 int getRandomSeed()
          Return the random number seed
 boolean getSeedable()
          Is seeding performed?
 Clusterer getThisClusterer()
          We always want to implement SemiSupClusterer from a class extending Clusterer.
static java.lang.Double getTimeStamp()
          Gets a Double representing the current date and time.
 boolean getVerbose()
          get the verbosity level of the clusterer
 java.util.Enumeration listOptions()
          Returns an enumeration of all the available options..
protected  int lookupInstanceCluster(Instance instance)
          lookup the instance in the checksum hash
static void main(java.lang.String[] args)
          Main method for testing this class.
protected  double[] meanOrMode(Instances insts)
          Fast version of meanOrMode - streamlined from Instances.meanOrMode for efficiency Does not check for missing attributes, assumes numeric attributes, assumes Sparse instances
protected  void nonActivePairwiseInit()
          Initialization routine for non-active algorithm
 void normalize(Instance inst)
          Normalizes Instance or SparseInstance
protected  void normalizeByWeight(Instance inst)
          This function divides every attribute value in an instance by the instance weight -- useful to find the mean of a cluster in Euclidean space
 void normalizeInstance(Instance inst)
          Normalizes the values of a normal Instance in L2 norm
 void normalizeSparseInstance(Instance inst)
          Normalizes the values of a SparseInstance in L2 norm
 int numberOfClusters()
          A duplicate function to conform to Clusterer abstract class.
 double objectiveFunction()
          returns objective function
 void printClusters()
          Prints clusters
 void printIndexClusters()
          Outputs the current clustering
 void resetClusterer()
          Reset all values that have been learned
protected  void runEM()
          Actual KMeans function
 boolean seedable()
          We can have clusterers that don't utilize seeding
 void seedClusterer(java.util.HashMap seedHash)
          Read the seeds from a hastable, where every key is an instance and every value is: the cluster assignment of that instance seedVector vector containing seeds
 void setAlgorithm(SelectedTag algo)
          Set the KMeans algorithm.
 void setCannotLinkWeight(double w)
          Set the cannot link constraint weight
 void setDefaultPerturb(double p)
          Set default perturbation value
 void setInstances(Instances instances)
          Sets training instances
 void setMetric(Metric m)
          Set the distance metric
 void setMustLinkWeight(double w)
          Set the must link constraint weight
 void setNumClusters(int n)
          Set the number of clusters to generate
 void setObjFunConvergenceDifference(double objFunConvergenceDifference)
          Set the minimum value of the objective function difference required for convergence
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setRandomSeed(int s)
          Set the random number seed
 void setSeedable(boolean seedable)
          Turn seeding on and off
 void setSeedHash(java.util.HashMap seedhash)
          Set the m_SeedHash
 void setVerbose(boolean verbose)
          set the verbosity level of the clusterer
protected  Instance sumInstances(Instance inst1, Instance inst2)
          Finds sum of 2 instances (handles sparse and non-sparse)
protected static void testCase()
           
 java.lang.String toString()
          return a string describing this clusterer
 void trainClusterer(Instances instances)
          Train the clusterer using specified parameters
protected  void updateClusterCentroids()
          M-step of the KMeans clustering algorithm -- updates cluster centroids
 
Methods inherited from class weka.clusterers.DistributionClusterer
clusterInstance
 
Methods inherited from class weka.clusterers.Clusterer
forName, makeCopies
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_Clusters

protected java.util.ArrayList m_Clusters
holds the instances in the clusters


m_IndexClusters

protected java.util.HashMap[] m_IndexClusters
holds the instance indices in the clusters, mapped to their probabilities


m_ConstraintsHash

protected java.util.HashMap m_ConstraintsHash
holds the ([instance pair] -> [type of constraint]) mapping. Note that the instance pairs stored in the hash always have constraint type InstancePair.DONT_CARE_LINK, the actual link type is stored in the hashed value


m_AdjacencyList

protected java.util.HashSet[] m_AdjacencyList
adjacency list for neighborhoods


m_SeedHash

protected java.util.HashSet m_SeedHash
holds the points involved in the constraints


m_CannotLinkWeight

protected double m_CannotLinkWeight
weight to be given to each constraint


m_MustLinkWeight

protected double m_MustLinkWeight
weight to be given to each constraint


m_Kappa

protected double m_Kappa
kappa value for vmf distribution


m_MaxKappaSim

protected double m_MaxKappaSim
max kappa value for vmf distribution


m_MaxKappaDist

protected double m_MaxKappaDist

m_MaxConstraintsAllowed

protected static final int m_MaxConstraintsAllowed
the maximum number of cannot-link constraints allowed

See Also:
Constant Field Values

m_verbose

protected boolean m_verbose
verbose?


m_metric

protected Metric m_metric
distance Metric


m_metricBuilt

protected boolean m_metricBuilt
has the metric has been constructed? a fix for multiple buildClusterer's


m_isSparseInstance

protected boolean m_isSparseInstance
indicates whether instances are sparse


m_objFunDecreasing

protected boolean m_objFunDecreasing
Is the objective function increasing or decreasing? Depends on type of metric used: for similarity-based metric - increasing, for distance-based - decreasing


m_Seedable

protected boolean m_Seedable
Seedable or not (true by default)


m_Iterations

protected int m_Iterations
keep track of the number of iterations completed before convergence


ALGORITHM_SIMPLE

public static final int ALGORITHM_SIMPLE
Define possible algorithms

See Also:
Constant Field Values

ALGORITHM_SPHERICAL

public static final int ALGORITHM_SPHERICAL
See Also:
Constant Field Values

TAGS_ALGORITHM

public static final Tag[] TAGS_ALGORITHM

m_Algorithm

protected int m_Algorithm
algorithm, by default spherical


m_ObjFunConvergenceDifference

protected double m_ObjFunConvergenceDifference
min. absolute difference of objective function values for convergence


m_Objective

protected double m_Objective
value of objective function


m_TotalTrainWithLabels

protected Instances m_TotalTrainWithLabels
training instances with labels


m_Instances

protected Instances m_Instances
training instances


m_checksumHash

protected java.util.HashMap m_checksumHash
A hash where the instance checksums are hashed


m_checksumCoeffs

protected double[] m_checksumCoeffs

m_StartingIndexOfTest

protected int m_StartingIndexOfTest
test data -- required to make sure that test points are not selected during active learning


m_NumClusters

protected int m_NumClusters
number of clusters to generate, default is -1 to get it from labeled data


m_NumCurrentClusters

protected int m_NumCurrentClusters
Number of clusters in the process


m_FastMode

protected boolean m_FastMode
m_FastMode = true => fast computation of meanOrMode in centroid calculation, useful for high-D data sets m_FastMode = false => usual computation of meanOrMode in centroid calculation


m_ClusterCentroids

protected Instances m_ClusterCentroids
holds the cluster centroids


m_GlobalCentroid

protected Instance m_GlobalCentroid
holds the global centroids


m_DefaultPerturb

protected double m_DefaultPerturb
holds the default perturbation value for randomPerturbInit


m_MergeThreshold

protected double m_MergeThreshold
holds the default merge threshold for matchMergeStep


m_ClusterDistribution

protected double[][] m_ClusterDistribution
temporary variable holding posterior cluster distribution of points while iterating


m_ClusterAssignments

protected int[] m_ClusterAssignments
temporary variable holding cluster assignments while iterating


m_SumOfClusterInstances

protected Instance[] m_SumOfClusterInstances
temporary variable holding cluster sums while iterating


m_RandomSeed

protected int m_RandomSeed
holds the random Seed, useful for randomPerturbInit


m_NeighborSets

protected java.util.HashSet[] m_NeighborSets
neighbor list: points in each neighborhood inferred from constraints

Constructor Detail

PCSoftKMeans

public PCSoftKMeans()

PCSoftKMeans

public PCSoftKMeans(Metric metric)
Method Detail

objectiveFunction

public double objectiveFunction()
returns objective function

Specified by:
objectiveFunction in interface SemiSupClusterer

getThisClusterer

public Clusterer getThisClusterer()
We always want to implement SemiSupClusterer from a class extending Clusterer. We want to be able to return the underlying parent class.

Specified by:
getThisClusterer in interface SemiSupClusterer
Returns:
parent Clusterer class

buildClusterer

public void buildClusterer(Instances data)
                    throws java.lang.Exception
Generates a clusterer. Instances in data have to be either all sparse or all non-sparse

Specified by:
buildClusterer in interface SemiSupClusterer
Specified by:
buildClusterer in class Clusterer
Parameters:
data - set of instances serving as training data
Throws:
java.lang.Exception - if the clusterer has not been generated successfully

buildClusterer

public void buildClusterer(Instances labeledData,
                           Instances unlabeledData,
                           int classIndex,
                           int numClusters,
                           int startingIndexOfTest)
                    throws java.lang.Exception
Clusters unlabeledData and labeledData (with labels removed), using labeledData as seeds

Specified by:
buildClusterer in interface SemiSupClusterer
Parameters:
labeledData - labeled instances to be used as seeds
unlabeledData - unlabeled instances
classIndex - attribute index in labeledData which holds class info
numClusters - number of clusters
startingIndexOfTest - from where test data starts in unlabeledData, useful if clustering is transductive
Throws:
java.lang.Exception - if something goes wrong.

buildClusterer

public void buildClusterer(Instances data,
                           int num_clusters)
                    throws java.lang.Exception
Cluster given instances to form the specified number of clusters.

Parameters:
data - instances to be clustered
num_clusters - number of clusters to create
Throws:
java.lang.Exception - if something goes wrong.

buildClusterer

public void buildClusterer(java.util.ArrayList labeledPair,
                           Instances unlabeledData,
                           Instances labeledTrain,
                           int startingIndexOfTest)
                    throws java.lang.Exception
Clusters unlabeledData and labeledData (with labels removed), using labeledData as seeds

Parameters:
unlabeledData - unlabeled training (+ test for transductive) instances
labeledTrain - labeled training instances
startingIndexOfTest - starting index of test set in unlabeled data
Throws:
java.lang.Exception - if something goes wrong.

buildClusterer

public void buildClusterer(Instances labeledData,
                           Instances unlabeledData,
                           int classIndex,
                           int numClusters)
                    throws java.lang.Exception
Clusters unlabeledData and labeledData (with labels removed), using labeledData as seeds -- NOT USED FOR PCSoftKMeans!!!

Parameters:
labeledData - labeled instances to be used as seeds
unlabeledData - unlabeled instances
classIndex - attribute index in labeledData which holds class info
numClusters - number of clusters
Throws:
java.lang.Exception - if something goes wrong.

resetClusterer

public void resetClusterer()
                    throws java.lang.Exception
Reset all values that have been learned

Specified by:
resetClusterer in interface SemiSupClusterer
Throws:
java.lang.Exception

setDefaultPerturb

public void setDefaultPerturb(double p)
Set default perturbation value

Parameters:
p - perturbation fraction

getDefaultPerturb

public double getDefaultPerturb()
Get default perturbation value

Returns:
perturbation fraction

setSeedable

public void setSeedable(boolean seedable)
Turn seeding on and off

Parameters:
seedable - should seeding be done?

getSeedable

public boolean getSeedable()
Is seeding performed?

Returns:
is seeding being done?

seedable

public boolean seedable()
We can have clusterers that don't utilize seeding


createCentroids

protected void createCentroids()
                        throws java.lang.Exception
Creates the global cluster centroid

Throws:
java.lang.Exception

addMLAndCLTransitiveClosure

protected void addMLAndCLTransitiveClosure(int[] indices)
                                    throws java.lang.Exception
adding other inferred ML and CL links to m_ConstraintsHash, from m_NeighborSets

Throws:
java.lang.Exception

DFS

protected void DFS()
            throws java.lang.Exception
Main Depth First Search routine

Throws:
java.lang.Exception

DFS_VISIT

protected void DFS_VISIT(int u,
                         int[] vertexColor)
                  throws java.lang.Exception
Recursive subroutine for DFS

Throws:
java.lang.Exception

nonActivePairwiseInit

protected void nonActivePairwiseInit()
                              throws java.lang.Exception
Initialization routine for non-active algorithm

Throws:
java.lang.Exception

normalizeByWeight

protected void normalizeByWeight(Instance inst)
This function divides every attribute value in an instance by the instance weight -- useful to find the mean of a cluster in Euclidean space

Parameters:
inst - Instance passed in for normalization (destructive update)

sumInstances

protected Instance sumInstances(Instance inst1,
                                Instance inst2)
                         throws java.lang.Exception
Finds sum of 2 instances (handles sparse and non-sparse)

Throws:
java.lang.Exception

printIndexClusters

public void printIndexClusters()
                        throws java.lang.Exception
Outputs the current clustering

Throws:
java.lang.Exception - if something goes wrong

findAssignments

protected double findAssignments()
                          throws java.lang.Exception
E-step of the KMeans clustering algorithm -- find new cluster assignments and new objective function

Throws:
java.lang.Exception

assignInstanceToClustersWithConstraints

public double assignInstanceToClustersWithConstraints(int instIdx)
                                               throws java.lang.Exception
Classifies the instance using the current clustering considering constraints, updates cluster assignment probs

Throws:
java.lang.Exception - if instance could not be assigned to clusters successfully

updateClusterCentroids

protected void updateClusterCentroids()
                               throws java.lang.Exception
M-step of the KMeans clustering algorithm -- updates cluster centroids

Throws:
java.lang.Exception

runEM

protected void runEM()
              throws java.lang.Exception
Actual KMeans function

Throws:
java.lang.Exception

bestInstancesForActiveLearning

public int[] bestInstancesForActiveLearning(int numActive)
                                     throws java.lang.Exception
Dummy: not implemented for PCSoftKMeans

Specified by:
bestInstancesForActiveLearning in interface ActiveLearningClusterer
Throws:
java.lang.Exception

bestPairsForActiveLearning

public InstancePair[] bestPairsForActiveLearning(int numActive)
                                          throws java.lang.Exception
Dummy: not implemented for PCSoftKMeans

Specified by:
bestPairsForActiveLearning in interface ActiveLearningClusterer
Throws:
java.lang.Exception

distributionForInstance

public double[] distributionForInstance(Instance instance)
                                 throws java.lang.Exception
Checks if instance has to be normalized and returns the distribution of the instance using the current clustering

Specified by:
distributionForInstance in class DistributionClusterer
Parameters:
instance - the instance under consideration
Returns:
an array containing the estimated membership probabilities of the test instance in each cluster (this should sum to at most 1)
Throws:
java.lang.Exception - if distribution could not be computed successfully

densityForInstance

public double densityForInstance(Instance inst)
                          throws java.lang.Exception
Computes the density for a given instance.

Specified by:
densityForInstance in class DistributionClusterer
Parameters:
inst - the instance to compute the density for
Returns:
the density.
Throws:
java.lang.Exception - if the density could not be computed successfully

lookupInstanceCluster

protected int lookupInstanceCluster(Instance instance)
lookup the instance in the checksum hash

Parameters:
instance - instance to be looked up
Returns:
the index of the cluster to which the instance was assigned, -1 if the instance has not bee clustered

assignInstanceToCluster

public int assignInstanceToCluster(Instance instance)
                            throws java.lang.Exception
Classifies the instance using the current clustering, without considering constraints

Parameters:
instance - the instance to be assigned to a cluster
Returns:
the number of the assigned cluster as an integer if the class is enumerated, otherwise the predicted value
Throws:
java.lang.Exception - if instance could not be classified successfully

setCannotLinkWeight

public void setCannotLinkWeight(double w)
Set the cannot link constraint weight


getCannotLinkWeight

public double getCannotLinkWeight()
Return the cannot link constraint weight


setMustLinkWeight

public void setMustLinkWeight(double w)
Set the must link constraint weight


getMustLinkWeight

public double getMustLinkWeight()
Return the must link constraint weight


getNumClusters

public int getNumClusters()
Return the number of clusters

Specified by:
getNumClusters in interface SemiSupClusterer

numberOfClusters

public int numberOfClusters()
A duplicate function to conform to Clusterer abstract class.

Specified by:
numberOfClusters in class Clusterer
Returns:
the number of clusters generated for a training dataset.

setSeedHash

public void setSeedHash(java.util.HashMap seedhash)
Set the m_SeedHash


setRandomSeed

public void setRandomSeed(int s)
Set the random number seed

Parameters:
s - the seed

getRandomSeed

public int getRandomSeed()
Return the random number seed


setObjFunConvergenceDifference

public void setObjFunConvergenceDifference(double objFunConvergenceDifference)
Set the minimum value of the objective function difference required for convergence

Parameters:
objFunConvergenceDifference - the minimum value of the objective function difference required for convergence

getObjFunConvergenceDifference

public double getObjFunConvergenceDifference()
Get the minimum value of the objective function difference required for convergence


setInstances

public void setInstances(Instances instances)
Sets training instances


getInstances

public Instances getInstances()
Return training instances

Specified by:
getInstances in interface SemiSupClusterer
Returns:
Instances used for clustering, or null

setNumClusters

public void setNumClusters(int n)
Set the number of clusters to generate

Specified by:
setNumClusters in interface SemiSupClusterer
Parameters:
n - the number of clusters to generate

setMetric

public void setMetric(Metric m)
Set the distance metric

Specified by:
setMetric in interface SemiSupClusterer

getMetric

public Metric getMetric()
Get the distance metric


setAlgorithm

public void setAlgorithm(SelectedTag algo)
Set the KMeans algorithm. Values other than ALGORITHM_SIMPLE or ALGORITHM_SPHERICAL will be ignored

Parameters:
algo - algorithm type

getAlgorithm

public SelectedTag getAlgorithm()
Get the KMeans algorithm type. Will be one of ALGORITHM_SIMPLE or ALGORITHM_SPHERICAL


seedClusterer

public void seedClusterer(java.util.HashMap seedHash)
Read the seeds from a hastable, where every key is an instance and every value is: the cluster assignment of that instance seedVector vector containing seeds

Specified by:
seedClusterer in interface SemiSupClusterer
Parameters:
seedHash - HashMap of seeding parameters

printClusters

public void printClusters()
                   throws java.lang.Exception
Prints clusters

Throws:
java.lang.Exception

getClusters

public java.util.ArrayList getClusters()
                                throws java.lang.Exception
Computes the clusters from the cluster assignments, for external access

Specified by:
getClusters in interface SemiSupClusterer
Throws:
java.lang.Exception - if clusters could not be computed successfully

getIndexClusters

public java.util.HashMap[] getIndexClusters()
                                     throws java.lang.Exception
Computes the clusters from the cluster assignments, for external access

Throws:
java.lang.Exception - if clusters could not be computed successfully

listOptions

public java.util.Enumeration listOptions()
Description copied from interface: OptionHandler
Returns an enumeration of all the available options..

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all available options.

getOptions

public java.lang.String[] getOptions()
Description copied from interface: OptionHandler
Gets the current option settings for the OptionHandler.

Specified by:
getOptions in interface OptionHandler
Returns:
the list of current option settings as an array of strings

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

toString

public java.lang.String toString()
return a string describing this clusterer

Returns:
a description of the clusterer as a string

setVerbose

public void setVerbose(boolean verbose)
set the verbosity level of the clusterer

Specified by:
setVerbose in interface SemiSupClusterer
Parameters:
verbose - messages on(true) or off (false)

getVerbose

public boolean getVerbose()
get the verbosity level of the clusterer

Returns:
messages on(true) or off (false)

trainClusterer

public void trainClusterer(Instances instances)
                    throws java.lang.Exception
Train the clusterer using specified parameters

Specified by:
trainClusterer in interface SemiSupClusterer
Parameters:
instances - Instances to be used for training
Throws:
java.lang.Exception

normalize

public void normalize(Instance inst)
               throws java.lang.Exception
Normalizes Instance or SparseInstance

Parameters:
inst - Instance to be normalized
Throws:
java.lang.Exception

normalizeInstance

public void normalizeInstance(Instance inst)
                       throws java.lang.Exception
Normalizes the values of a normal Instance in L2 norm

Parameters:
inst - Instance to be normalized
Throws:
java.lang.Exception

normalizeSparseInstance

public void normalizeSparseInstance(Instance inst)
                             throws java.lang.Exception
Normalizes the values of a SparseInstance in L2 norm

Parameters:
inst - SparseInstance to be normalized
Throws:
java.lang.Exception

meanOrMode

protected double[] meanOrMode(Instances insts)
Fast version of meanOrMode - streamlined from Instances.meanOrMode for efficiency Does not check for missing attributes, assumes numeric attributes, assumes Sparse instances


getTimeStamp

public static java.lang.Double getTimeStamp()
Gets a Double representing the current date and time. eg: 1:46pm on 20/5/1999 -> 19990520.1346

Returns:
a value of type Double

main

public static void main(java.lang.String[] args)
Main method for testing this class.


testCase

protected static void testCase()