weka.clusterers
Class PCKMeans

java.lang.Object
  extended byweka.clusterers.Clusterer
      extended byweka.clusterers.PCKMeans
All Implemented Interfaces:
ActiveLearningClusterer, java.lang.Cloneable, OptionHandler, SemiSupClusterer, java.io.Serializable

public class PCKMeans
extends Clusterer
implements OptionHandler, SemiSupClusterer, ActiveLearningClusterer

Pairwise constrained k means clustering class. Valid options are:

-N
Specify the number of clusters to generate.

-R
Specify random number seed

-A
The algorithm can be "Simple" (simple KMeans) or "Spherical" (spherical KMeans) -M
Specifies the name of the distance metric class that should be used .... etc.

See Also:
Clusterer, OptionHandler, Serialized Form

Field Summary
static int ALGORITHM_SIMPLE
          Define possible algorithms
static int ALGORITHM_SPHERICAL
           
protected  boolean m_Active
          active mode?
protected  java.util.HashSet[] m_AdjacencyList
          adjacency list for neighborhoods
protected  int m_Algorithm
          algorithm, by default spherical
protected  boolean m_AllExplore
          Two-phase active learning or All Explore
protected  double m_CannotLinkWeight
          weight to be given to each constraint
protected  double[] m_checksumCoeffs
           
protected  java.util.HashMap m_checksumHash
          A hash where the instance checksums are hashed
protected  int[] m_ClusterAssignments
          temporary variable holding cluster assignments while iterating
protected  Instances m_ClusterCentroids
          holds the cluster centroids
protected  java.util.ArrayList m_Clusters
          holds the instances in the clusters
protected  java.util.HashMap m_ConstraintsHash
          holds the ([instance pair] -> [type of constraint]) mapping.
protected  double m_DefaultPerturb
          holds the default perturbation value for randomPerturbInit
protected  boolean m_FastMode
          m_FastMode = true => fast computation of meanOrMode in centroid calculation, useful for high-D data sets m_FastMode = false => usual computation of meanOrMode in centroid calculation
protected  Instance m_GlobalCentroid
          holds the global centroids
protected  java.util.HashSet[] m_IndexClusters
          holds the instance indices in the clusters
protected  java.util.HashMap m_instanceConstraintHash
          holds the ([instance i] -> [Arraylist of constraints involving i]) mapping.
protected  int m_InstanceOrdering
           
protected  Instances m_Instances
          training instances
protected  boolean m_isSparseInstance
          indicates whether instances are sparse
protected  int m_Iterations
          keep track of the number of iterations completed before convergence
protected static int m_MaxConstraintsAllowed
          the maximum number of cannot-link constraints allowed
protected  double m_MergeThreshold
          holds the default merge threshold for matchMergeStep
protected  Metric m_metric
          distance Metric
protected  boolean m_metricBuilt
          has the metric has been constructed? a fix for multiple buildClusterer's
protected  boolean m_MovePointsTillAssignmentStabilizes
          Move points in assignment step till stabilization?
protected  double m_MustLinkWeight
          weight to be given to each constraint
protected  java.util.HashSet[] m_NeighborSets
          neighbor list for active learning: points in each cluster neighborhood
protected  int m_NumActive
          number of pairs to seed with
protected  int m_NumClusters
          number of clusters to generate, default is -1 to get it from labeled data
protected  int m_NumCurrentClusters
          Number of clusters in the process
protected  double m_Objective
          value of objective function
protected  double m_ObjFunConvergenceDifference
          min difference of objective function values for convergence
protected  boolean m_objFunDecreasing
          Is the objective function increasing or decreasing? Depends on type of metric used: for similarity-based metric - increasing, for distance-based - decreasing
protected  boolean m_PhaseTwoRandom
          Round robin or Random in active Phase Two
protected  java.util.Random m_RandomNumberGenerator
          holds the random number generator used in various parts of the code
protected  int m_RandomSeed
          holds the random Seed used to seed the random number generator
protected  boolean m_Seedable
          Seedable or not (true by default)
protected  java.util.HashSet m_SeedHash
          holds the points involved in the constraints
protected  int m_StartingIndexOfTest
          test data -- required to make sure that test points are not selected during active learning
protected  Instance[] m_SumOfClusterInstances
          temporary variable holding cluster sums while iterating
protected  Instances m_TotalTrainWithLabels
          training instances with labels
protected  boolean m_verbose
          verbose?
static int ORDERING_DEFAULT
          Define possible orderings
static int ORDERING_RANDOM
           
static int ORDERING_SORTED
           
static Tag[] TAGS_ALGORITHM
           
static Tag[] TAGS_ORDERING
           
 
Constructor Summary
PCKMeans()
           
PCKMeans(Metric metric)
           
 
Method Summary
protected  int activePhaseOne(int numQueries)
          Phase 1 code for active learning
protected  void activePhaseTwoRandom(int numQueries)
          Phase 2 code for active learning, random
protected  void activePhaseTwoRoundRobin(int numQueries)
          Phase 2 code for active learning, with round robin
protected  void addMLAndCLTransitiveClosure(int[] indices)
          adding other inferred ML and CL links to m_ConstraintsHash, from m_NeighborSets
protected  int askOracle(int X, int Y)
           
 int assignInstanceToCluster(Instance instance)
          Classifies the instance using the current clustering, without considering constraints
 int assignInstanceToClusterWithConstraints(int instIdx)
          Classifies the instance using the current clustering considering constraints, updates cluster assignments
 int[] bestInstancesForActiveLearning(int numActive)
          Dummy: not implemented for PCKMeans
 InstancePair[] bestPairsForActiveLearning(int numActive)
          Returns the indices of the best numActive instances for active learning
 void buildClusterer(java.util.ArrayList labeledPair, Instances unlabeledData, Instances labeledTrain, int startingIndexOfTest)
          Clusters unlabeledData and labeledData (with labels removed), using labeledData as seeds
 void buildClusterer(Instances data)
          Generates a clusterer.
 void buildClusterer(Instances labeledData, Instances unlabeledData, int classIndex, int numClusters)
          Clusters unlabeledData and labeledData (with labels removed), using labeledData as seeds -- NOT USED FOR PCKMeans!!!
 void buildClusterer(Instances labeledData, Instances unlabeledData, int classIndex, int numClusters, int startingIndexOfTest)
          Clusters unlabeledData and labeledData (with labels removed), using labeledData as seeds
 void buildClusterer(Instances data, int num_clusters)
          Cluster given instances to form the specified number of clusters.
protected  void calculateObjectiveFunction()
          calculates objective function
 int clusterInstance(Instance instance)
          Checks if instance has to be normalized and classifies the instance using the current clustering
protected  void createCentroids()
          Creates the global cluster centroid
protected  void DFS_VISIT(int u, int[] vertexColor)
          Recursive subroutine for DFS
protected  void DFS()
          Main Depth First Search routine
protected  void findBestAssignments()
          E-step of the KMeans clustering algorithm -- find best cluster assignments
 boolean getActive()
          get the active level of clusterer
 SelectedTag getAlgorithm()
          Get the KMeans algorithm type.
 boolean getAllExplore()
          Return m_AllExplore
 double getCannotLinkWeight()
          Return the cannot link constraint weight
 java.util.ArrayList getClusters()
          Computes the clusters from the cluster assignments, for external access
 double getDefaultPerturb()
          Get default perturbation value
 java.util.HashSet[] getIndexClusters()
          Computes the clusters from the cluster assignments, for external access
 SelectedTag getInstanceOrdering()
          Get the instance ordering
 Instances getInstances()
          Return training instances
 Metric getMetric()
          Get the distance metric
 boolean getMovePointsTillAssignmentStabilizes()
          Return m_MovePointsTillAssignmentStabilizes
 double getMustLinkWeight()
          Return the must link constraint weight
 int getNumClusters()
          Return the number of clusters
 double getObjFunConvergenceDifference()
          Get the minimum value of the objective function difference required for convergence
 java.lang.String[] getOptions()
          Gets the current option settings for the OptionHandler.
 boolean getPhaseTwoRandom()
          Return m_PhaseTwoRandom
 int getRandomSeed()
          Return the random number seed
 boolean getSeedable()
          Is seeding performed?
 Clusterer getThisClusterer()
          We always want to implement SemiSupClusterer from a class extending Clusterer.
static java.lang.Double getTimeStamp()
          Gets a Double representing the current date and time.
 boolean getVerbose()
          get the verbosity level of the clusterer
 java.util.Enumeration listOptions()
          Returns an enumeration of all the available options..
protected  int lookupInstanceCluster(Instance instance)
          lookup the instance in the checksum hash
static void main(java.lang.String[] args)
          Main method for testing this class.
protected  double[] meanOrMode(Instances insts)
          Fast version of meanOrMode - streamlined from Instances.meanOrMode for efficiency Does not check for missing attributes, assumes numeric attributes, assumes Sparse instances
protected  void nonActivePairwiseInit()
          Initialization routine for non-active algorithm
 void normalize(Instance inst)
          Normalizes Instance or SparseInstance
protected  void normalizeByWeight(Instance inst)
          This function divides every attribute value in an instance by the instance weight -- useful to find the mean of a cluster in Euclidean space
 void normalizeInstance(Instance inst)
          Normalizes the values of a normal Instance in L2 norm
 void normalizeSparseInstance(Instance inst)
          Normalizes the values of a SparseInstance in L2 norm
 int numberOfClusters()
          A duplicate function to conform to Clusterer abstract class.
 double objectiveFunction()
          returns objective function
 void printClusters()
          Prints clusters
 void printIndexClusters()
          Outputs the current clustering
 void resetClusterer()
          Reset all values that have been learned
protected  void runKMeans()
          Actual KMeans function
 boolean seedable()
          We can have clusterers that don't utilize seeding
 void seedClusterer(java.util.HashMap seedHash)
          Read the seeds from a hastable, where every key is an instance and every value is: the cluster assignment of that instance seedVector vector containing seeds
 void setActive(boolean active)
          set the active level of the clusterer
 void setAlgorithm(SelectedTag algo)
          Set the KMeans algorithm.
 void setAllExplore(boolean b)
          Set m_AllExplore
 void setCannotLinkWeight(double w)
          Set the cannot link constraint weight
 void setDefaultPerturb(double p)
          Set default perturbation value
 void setInstanceOrdering(SelectedTag order)
          Set the instance ordering
 void setInstances(Instances instances)
          Sets training instances
 void setMetric(Metric m)
          Set the distance metric
 void setMovePointsTillAssignmentStabilizes(boolean b)
          Set m_MovePointsTillAssignmentStabilizes
 void setMustLinkWeight(double w)
          Set the must link constraint weight
 void setNumClusters(int n)
          Set the number of clusters to generate
 void setObjFunConvergenceDifference(double objFunConvergenceDifference)
          Set the minimum value of the objective function difference required for convergence
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setPhaseTwoRandom(boolean w)
          Set m_PhaseTwoRandom
 void setRandomSeed(int s)
          Set the random number seed
 void setSeedable(boolean seedable)
          Turn seeding on and off
 void setSeedHash(java.util.HashMap seedhash)
          Set the m_SeedHash
 void setVerbose(boolean verbose)
          set the verbosity level of the clusterer
protected  Instance sumInstances(Instance inst1, Instance inst2)
          Finds sum of 2 instances (handles sparse and non-sparse)
protected static void testCase()
           
 java.lang.String toString()
          return a string describing this clusterer
 void trainClusterer(Instances instances)
          Train the clusterer using specified parameters
protected  void updateClusterAssignments()
          Updates the clusterAssignments for all points after clustering.
protected  void updateClusterCentroids()
          M-step of the KMeans clustering algorithm -- updates cluster centroids
 
Methods inherited from class weka.clusterers.Clusterer
forName, makeCopies
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_Clusters

protected java.util.ArrayList m_Clusters
holds the instances in the clusters


m_IndexClusters

protected java.util.HashSet[] m_IndexClusters
holds the instance indices in the clusters


m_ConstraintsHash

protected java.util.HashMap m_ConstraintsHash
holds the ([instance pair] -> [type of constraint]) mapping. Note that the instance pairs stored in the hash always have constraint type InstancePair.DONT_CARE_LINK, the actual link type is stored in the hashed value


m_instanceConstraintHash

protected java.util.HashMap m_instanceConstraintHash
holds the ([instance i] -> [Arraylist of constraints involving i]) mapping. Note that the instance pairs stored in the Arraylist have the actual link type


m_AdjacencyList

protected java.util.HashSet[] m_AdjacencyList
adjacency list for neighborhoods


m_SeedHash

protected java.util.HashSet m_SeedHash
holds the points involved in the constraints


m_CannotLinkWeight

protected double m_CannotLinkWeight
weight to be given to each constraint


m_MustLinkWeight

protected double m_MustLinkWeight
weight to be given to each constraint


m_MaxConstraintsAllowed

protected static final int m_MaxConstraintsAllowed
the maximum number of cannot-link constraints allowed

See Also:
Constant Field Values

m_verbose

protected boolean m_verbose
verbose?


m_metric

protected Metric m_metric
distance Metric


m_metricBuilt

protected boolean m_metricBuilt
has the metric has been constructed? a fix for multiple buildClusterer's


m_isSparseInstance

protected boolean m_isSparseInstance
indicates whether instances are sparse


m_objFunDecreasing

protected boolean m_objFunDecreasing
Is the objective function increasing or decreasing? Depends on type of metric used: for similarity-based metric - increasing, for distance-based - decreasing


m_Seedable

protected boolean m_Seedable
Seedable or not (true by default)


m_PhaseTwoRandom

protected boolean m_PhaseTwoRandom
Round robin or Random in active Phase Two


m_AllExplore

protected boolean m_AllExplore
Two-phase active learning or All Explore


m_Iterations

protected int m_Iterations
keep track of the number of iterations completed before convergence


ALGORITHM_SIMPLE

public static final int ALGORITHM_SIMPLE
Define possible algorithms

See Also:
Constant Field Values

ALGORITHM_SPHERICAL

public static final int ALGORITHM_SPHERICAL
See Also:
Constant Field Values

TAGS_ALGORITHM

public static final Tag[] TAGS_ALGORITHM

m_Algorithm

protected int m_Algorithm
algorithm, by default spherical


m_ObjFunConvergenceDifference

protected double m_ObjFunConvergenceDifference
min difference of objective function values for convergence


m_Objective

protected double m_Objective
value of objective function


m_TotalTrainWithLabels

protected Instances m_TotalTrainWithLabels
training instances with labels


m_Instances

protected Instances m_Instances
training instances


m_checksumHash

protected java.util.HashMap m_checksumHash
A hash where the instance checksums are hashed


m_checksumCoeffs

protected double[] m_checksumCoeffs

m_StartingIndexOfTest

protected int m_StartingIndexOfTest
test data -- required to make sure that test points are not selected during active learning


m_NumActive

protected int m_NumActive
number of pairs to seed with


m_Active

protected boolean m_Active
active mode?


m_NumClusters

protected int m_NumClusters
number of clusters to generate, default is -1 to get it from labeled data


m_NumCurrentClusters

protected int m_NumCurrentClusters
Number of clusters in the process


m_FastMode

protected boolean m_FastMode
m_FastMode = true => fast computation of meanOrMode in centroid calculation, useful for high-D data sets m_FastMode = false => usual computation of meanOrMode in centroid calculation


m_ClusterCentroids

protected Instances m_ClusterCentroids
holds the cluster centroids


m_GlobalCentroid

protected Instance m_GlobalCentroid
holds the global centroids


m_DefaultPerturb

protected double m_DefaultPerturb
holds the default perturbation value for randomPerturbInit


m_MergeThreshold

protected double m_MergeThreshold
holds the default merge threshold for matchMergeStep


m_ClusterAssignments

protected int[] m_ClusterAssignments
temporary variable holding cluster assignments while iterating


m_SumOfClusterInstances

protected Instance[] m_SumOfClusterInstances
temporary variable holding cluster sums while iterating


m_RandomSeed

protected int m_RandomSeed
holds the random Seed used to seed the random number generator


m_RandomNumberGenerator

protected java.util.Random m_RandomNumberGenerator
holds the random number generator used in various parts of the code


ORDERING_DEFAULT

public static final int ORDERING_DEFAULT
Define possible orderings

See Also:
Constant Field Values

ORDERING_RANDOM

public static final int ORDERING_RANDOM
See Also:
Constant Field Values

ORDERING_SORTED

public static final int ORDERING_SORTED
See Also:
Constant Field Values

TAGS_ORDERING

public static final Tag[] TAGS_ORDERING

m_InstanceOrdering

protected int m_InstanceOrdering

m_MovePointsTillAssignmentStabilizes

protected boolean m_MovePointsTillAssignmentStabilizes
Move points in assignment step till stabilization?


m_NeighborSets

protected java.util.HashSet[] m_NeighborSets
neighbor list for active learning: points in each cluster neighborhood

Constructor Detail

PCKMeans

public PCKMeans()

PCKMeans

public PCKMeans(Metric metric)
Method Detail

objectiveFunction

public double objectiveFunction()
returns objective function

Specified by:
objectiveFunction in interface SemiSupClusterer

getThisClusterer

public Clusterer getThisClusterer()
We always want to implement SemiSupClusterer from a class extending Clusterer. We want to be able to return the underlying parent class.

Specified by:
getThisClusterer in interface SemiSupClusterer
Returns:
parent Clusterer class

buildClusterer

public void buildClusterer(Instances labeledData,
                           Instances unlabeledData,
                           int classIndex,
                           int numClusters,
                           int startingIndexOfTest)
                    throws java.lang.Exception
Clusters unlabeledData and labeledData (with labels removed), using labeledData as seeds

Specified by:
buildClusterer in interface SemiSupClusterer
Parameters:
labeledData - labeled instances to be used as seeds
unlabeledData - unlabeled instances
classIndex - attribute index in labeledData which holds class info
numClusters - number of clusters
startingIndexOfTest - from where test data starts in unlabeledData, useful if clustering is transductive
Throws:
java.lang.Exception - if something goes wrong.

buildClusterer

public void buildClusterer(Instances data,
                           int num_clusters)
                    throws java.lang.Exception
Cluster given instances to form the specified number of clusters.

Parameters:
data - instances to be clustered
num_clusters - number of clusters to create
Throws:
java.lang.Exception - if something goes wrong.

buildClusterer

public void buildClusterer(java.util.ArrayList labeledPair,
                           Instances unlabeledData,
                           Instances labeledTrain,
                           int startingIndexOfTest)
                    throws java.lang.Exception
Clusters unlabeledData and labeledData (with labels removed), using labeledData as seeds

Parameters:
unlabeledData - unlabeled training (+ test for transductive) instances
labeledTrain - labeled training instances
startingIndexOfTest - starting index of test set in unlabeled data
Throws:
java.lang.Exception - if something goes wrong.

buildClusterer

public void buildClusterer(Instances labeledData,
                           Instances unlabeledData,
                           int classIndex,
                           int numClusters)
                    throws java.lang.Exception
Clusters unlabeledData and labeledData (with labels removed), using labeledData as seeds -- NOT USED FOR PCKMeans!!!

Parameters:
labeledData - labeled instances to be used as seeds
unlabeledData - unlabeled instances
classIndex - attribute index in labeledData which holds class info
numClusters - number of clusters
Throws:
java.lang.Exception - if something goes wrong.

resetClusterer

public void resetClusterer()
                    throws java.lang.Exception
Reset all values that have been learned

Specified by:
resetClusterer in interface SemiSupClusterer
Throws:
java.lang.Exception

setDefaultPerturb

public void setDefaultPerturb(double p)
Set default perturbation value

Parameters:
p - perturbation fraction

getDefaultPerturb

public double getDefaultPerturb()
Get default perturbation value

Returns:
perturbation fraction

setSeedable

public void setSeedable(boolean seedable)
Turn seeding on and off

Parameters:
seedable - should seeding be done?

getSeedable

public boolean getSeedable()
Is seeding performed?

Returns:
is seeding being done?

seedable

public boolean seedable()
We can have clusterers that don't utilize seeding


activePhaseOne

protected int activePhaseOne(int numQueries)
                      throws java.lang.Exception
Phase 1 code for active learning

Throws:
java.lang.Exception

activePhaseTwoRoundRobin

protected void activePhaseTwoRoundRobin(int numQueries)
                                 throws java.lang.Exception
Phase 2 code for active learning, with round robin

Throws:
java.lang.Exception

activePhaseTwoRandom

protected void activePhaseTwoRandom(int numQueries)
                             throws java.lang.Exception
Phase 2 code for active learning, random

Throws:
java.lang.Exception

createCentroids

protected void createCentroids()
                        throws java.lang.Exception
Creates the global cluster centroid

Throws:
java.lang.Exception

addMLAndCLTransitiveClosure

protected void addMLAndCLTransitiveClosure(int[] indices)
                                    throws java.lang.Exception
adding other inferred ML and CL links to m_ConstraintsHash, from m_NeighborSets

Throws:
java.lang.Exception

DFS

protected void DFS()
            throws java.lang.Exception
Main Depth First Search routine

Throws:
java.lang.Exception

DFS_VISIT

protected void DFS_VISIT(int u,
                         int[] vertexColor)
                  throws java.lang.Exception
Recursive subroutine for DFS

Throws:
java.lang.Exception

nonActivePairwiseInit

protected void nonActivePairwiseInit()
                              throws java.lang.Exception
Initialization routine for non-active algorithm

Throws:
java.lang.Exception

askOracle

protected int askOracle(int X,
                        int Y)

normalizeByWeight

protected void normalizeByWeight(Instance inst)
This function divides every attribute value in an instance by the instance weight -- useful to find the mean of a cluster in Euclidean space

Parameters:
inst - Instance passed in for normalization (destructive update)

sumInstances

protected Instance sumInstances(Instance inst1,
                                Instance inst2)
                         throws java.lang.Exception
Finds sum of 2 instances (handles sparse and non-sparse)

Throws:
java.lang.Exception

updateClusterAssignments

protected void updateClusterAssignments()
                                 throws java.lang.Exception
Updates the clusterAssignments for all points after clustering. Map assignments from [0,numInstances-1] to [0,numClusters-1] i.e. from [0 2 2 0 6 6 2] -> [0 1 1 0 2 2 0] **** NOTE: THIS FUNCTION IS NO LONGER USED!!! ****

Throws:
java.lang.Exception

printIndexClusters

public void printIndexClusters()
                        throws java.lang.Exception
Outputs the current clustering

Throws:
java.lang.Exception - if something goes wrong

findBestAssignments

protected void findBestAssignments()
                            throws java.lang.Exception
E-step of the KMeans clustering algorithm -- find best cluster assignments

Throws:
java.lang.Exception

assignInstanceToClusterWithConstraints

public int assignInstanceToClusterWithConstraints(int instIdx)
                                           throws java.lang.Exception
Classifies the instance using the current clustering considering constraints, updates cluster assignments

Returns:
1 if the point is moved, 0 otherwise
Throws:
java.lang.Exception - if instance could not be classified successfully

updateClusterCentroids

protected void updateClusterCentroids()
                               throws java.lang.Exception
M-step of the KMeans clustering algorithm -- updates cluster centroids

Throws:
java.lang.Exception

calculateObjectiveFunction

protected void calculateObjectiveFunction()
                                   throws java.lang.Exception
calculates objective function

Throws:
java.lang.Exception

buildClusterer

public void buildClusterer(Instances data)
                    throws java.lang.Exception
Generates a clusterer. Instances in data have to be either all sparse or all non-sparse

Specified by:
buildClusterer in interface SemiSupClusterer
Specified by:
buildClusterer in class Clusterer
Parameters:
data - set of instances serving as training data
Throws:
java.lang.Exception - if the clusterer has not been generated successfully

runKMeans

protected void runKMeans()
                  throws java.lang.Exception
Actual KMeans function

Throws:
java.lang.Exception

bestInstancesForActiveLearning

public int[] bestInstancesForActiveLearning(int numActive)
                                     throws java.lang.Exception
Dummy: not implemented for PCKMeans

Specified by:
bestInstancesForActiveLearning in interface ActiveLearningClusterer
Throws:
java.lang.Exception

bestPairsForActiveLearning

public InstancePair[] bestPairsForActiveLearning(int numActive)
                                          throws java.lang.Exception
Returns the indices of the best numActive instances for active learning

Specified by:
bestPairsForActiveLearning in interface ActiveLearningClusterer
Throws:
java.lang.Exception

clusterInstance

public int clusterInstance(Instance instance)
                    throws java.lang.Exception
Checks if instance has to be normalized and classifies the instance using the current clustering

Specified by:
clusterInstance in class Clusterer
Parameters:
instance - the instance to be assigned to a cluster
Returns:
the number of the assigned cluster as an integer if the class is enumerated, otherwise the predicted value
Throws:
java.lang.Exception - if instance could not be classified successfully

lookupInstanceCluster

protected int lookupInstanceCluster(Instance instance)
lookup the instance in the checksum hash

Parameters:
instance - instance to be looked up
Returns:
the index of the cluster to which the instance was assigned, -1 if the instance has not bee clustered

assignInstanceToCluster

public int assignInstanceToCluster(Instance instance)
                            throws java.lang.Exception
Classifies the instance using the current clustering, without considering constraints

Parameters:
instance - the instance to be assigned to a cluster
Returns:
the number of the assigned cluster as an integer if the class is enumerated, otherwise the predicted value
Throws:
java.lang.Exception - if instance could not be classified successfully

setCannotLinkWeight

public void setCannotLinkWeight(double w)
Set the cannot link constraint weight


getCannotLinkWeight

public double getCannotLinkWeight()
Return the cannot link constraint weight


setMustLinkWeight

public void setMustLinkWeight(double w)
Set the must link constraint weight


getMustLinkWeight

public double getMustLinkWeight()
Return the must link constraint weight


getPhaseTwoRandom

public boolean getPhaseTwoRandom()
Return m_PhaseTwoRandom


setPhaseTwoRandom

public void setPhaseTwoRandom(boolean w)
Set m_PhaseTwoRandom


getAllExplore

public boolean getAllExplore()
Return m_AllExplore


setAllExplore

public void setAllExplore(boolean b)
Set m_AllExplore


getNumClusters

public int getNumClusters()
Return the number of clusters

Specified by:
getNumClusters in interface SemiSupClusterer

numberOfClusters

public int numberOfClusters()
A duplicate function to conform to Clusterer abstract class.

Specified by:
numberOfClusters in class Clusterer
Returns:
the number of clusters generated for a training dataset.

setSeedHash

public void setSeedHash(java.util.HashMap seedhash)
Set the m_SeedHash


setRandomSeed

public void setRandomSeed(int s)
Set the random number seed

Parameters:
s - the seed

getRandomSeed

public int getRandomSeed()
Return the random number seed


setMovePointsTillAssignmentStabilizes

public void setMovePointsTillAssignmentStabilizes(boolean b)
Set m_MovePointsTillAssignmentStabilizes

Parameters:
b - truth value

getMovePointsTillAssignmentStabilizes

public boolean getMovePointsTillAssignmentStabilizes()
Return m_MovePointsTillAssignmentStabilizes


setObjFunConvergenceDifference

public void setObjFunConvergenceDifference(double objFunConvergenceDifference)
Set the minimum value of the objective function difference required for convergence

Parameters:
objFunConvergenceDifference - the minimum value of the objective function difference required for convergence

getObjFunConvergenceDifference

public double getObjFunConvergenceDifference()
Get the minimum value of the objective function difference required for convergence


setInstances

public void setInstances(Instances instances)
Sets training instances


getInstances

public Instances getInstances()
Return training instances

Specified by:
getInstances in interface SemiSupClusterer
Returns:
Instances used for clustering, or null

setNumClusters

public void setNumClusters(int n)
Set the number of clusters to generate

Specified by:
setNumClusters in interface SemiSupClusterer
Parameters:
n - the number of clusters to generate

setMetric

public void setMetric(Metric m)
Set the distance metric

Specified by:
setMetric in interface SemiSupClusterer

getMetric

public Metric getMetric()
Get the distance metric


setAlgorithm

public void setAlgorithm(SelectedTag algo)
Set the KMeans algorithm. Values other than ALGORITHM_SIMPLE or ALGORITHM_SPHERICAL will be ignored

Parameters:
algo - algorithm type

getAlgorithm

public SelectedTag getAlgorithm()
Get the KMeans algorithm type. Will be one of ALGORITHM_SIMPLE or ALGORITHM_SPHERICAL


setInstanceOrdering

public void setInstanceOrdering(SelectedTag order)
Set the instance ordering

Parameters:
order - instance ordering

getInstanceOrdering

public SelectedTag getInstanceOrdering()
Get the instance ordering


seedClusterer

public void seedClusterer(java.util.HashMap seedHash)
Read the seeds from a hastable, where every key is an instance and every value is: the cluster assignment of that instance seedVector vector containing seeds

Specified by:
seedClusterer in interface SemiSupClusterer
Parameters:
seedHash - HashMap of seeding parameters

printClusters

public void printClusters()
                   throws java.lang.Exception
Prints clusters

Throws:
java.lang.Exception

getClusters

public java.util.ArrayList getClusters()
                                throws java.lang.Exception
Computes the clusters from the cluster assignments, for external access

Specified by:
getClusters in interface SemiSupClusterer
Throws:
java.lang.Exception - if clusters could not be computed successfully

getIndexClusters

public java.util.HashSet[] getIndexClusters()
                                     throws java.lang.Exception
Computes the clusters from the cluster assignments, for external access

Throws:
java.lang.Exception - if clusters could not be computed successfully

listOptions

public java.util.Enumeration listOptions()
Description copied from interface: OptionHandler
Returns an enumeration of all the available options..

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all available options.

getOptions

public java.lang.String[] getOptions()
Description copied from interface: OptionHandler
Gets the current option settings for the OptionHandler.

Specified by:
getOptions in interface OptionHandler
Returns:
the list of current option settings as an array of strings

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

toString

public java.lang.String toString()
return a string describing this clusterer

Returns:
a description of the clusterer as a string

setActive

public void setActive(boolean active)
set the active level of the clusterer

Parameters:
active -

getActive

public boolean getActive()
get the active level of clusterer

Returns:
active

setVerbose

public void setVerbose(boolean verbose)
set the verbosity level of the clusterer

Specified by:
setVerbose in interface SemiSupClusterer
Parameters:
verbose - messages on(true) or off (false)

getVerbose

public boolean getVerbose()
get the verbosity level of the clusterer

Returns:
messages on(true) or off (false)

trainClusterer

public void trainClusterer(Instances instances)
                    throws java.lang.Exception
Train the clusterer using specified parameters

Specified by:
trainClusterer in interface SemiSupClusterer
Parameters:
instances - Instances to be used for training
Throws:
java.lang.Exception

normalize

public void normalize(Instance inst)
               throws java.lang.Exception
Normalizes Instance or SparseInstance

Parameters:
inst - Instance to be normalized
Throws:
java.lang.Exception

normalizeInstance

public void normalizeInstance(Instance inst)
                       throws java.lang.Exception
Normalizes the values of a normal Instance in L2 norm

Parameters:
inst - Instance to be normalized
Throws:
java.lang.Exception

normalizeSparseInstance

public void normalizeSparseInstance(Instance inst)
                             throws java.lang.Exception
Normalizes the values of a SparseInstance in L2 norm

Parameters:
inst - SparseInstance to be normalized
Throws:
java.lang.Exception

meanOrMode

protected double[] meanOrMode(Instances insts)
Fast version of meanOrMode - streamlined from Instances.meanOrMode for efficiency Does not check for missing attributes, assumes numeric attributes, assumes Sparse instances


getTimeStamp

public static java.lang.Double getTimeStamp()
Gets a Double representing the current date and time. eg: 1:46pm on 20/5/1999 -> 19990520.1346

Returns:
a value of type Double

main

public static void main(java.lang.String[] args)
Main method for testing this class.


testCase

protected static void testCase()