Specifies the name of the distance metric class that should be used
- See Also:
Clusterer
,
OptionHandler
,
Serialized Form
Method Summary |
protected int |
activePhaseOne(int numQueries)
Phase 1 code for active learning |
protected void |
activePhaseTwoRandom(int numQueries)
Phase 2 code for active learning, random |
protected void |
activePhaseTwoRoundRobin(int numQueries)
Phase 2 code for active learning, with round robin |
protected void |
addMLAndCLTransitiveClosure(int[] indices)
adding other inferred ML and CL links to m_ConstraintsHash, from
m_NeighborSets |
protected int |
askOracle(int X,
int Y)
Query: oracle replies on link |
int |
assignAllInstancesToClusters()
Classifies the instances using the current clustering, moves
must-linked points together (Xing's approach) |
int |
assignInstanceToCluster(Instance instance)
Classifies the instance using the current clustering, without considering constraints |
int |
assignInstanceToClusterWithConstraints(int instIdx)
Classifies the instance using the current clustering, considering constraints |
int[] |
bestInstancesForActiveLearning(int numActive)
Dummy: not implemented for MPCKMeans |
InstancePair[] |
bestPairsForActiveLearning(int numActive)
Returns the indices of the best numActive instances for active learning |
void |
buildClusterer(java.util.ArrayList labeledPair,
Instances unlabeledData,
Instances labeledData,
int numClusters,
int startingIndexOfTest)
Clusters unlabeledData and labeledData (with labels removed),
using labeledData as seeds |
void |
buildClusterer(Instances data)
Generates a clusterer. |
void |
buildClusterer(Instances labeledData,
Instances unlabeledData,
int classIndex,
int numClusters)
Clusters unlabeledData and labeledData (with labels removed),
using labeledData as seeds |
void |
buildClusterer(Instances labeledData,
Instances unlabeledData,
int classIndex,
int numClusters,
int startingIndexOfTest)
Clusters unlabeledData and labeledData (with labels removed),
using labeledData as seeds |
void |
buildClusterer(Instances data,
int num_clusters)
Cluster given instances to form the specified number of clusters. |
protected void |
calculateConstraintPenalties()
Go through all the constraints and accumulate the penalties from violated constraints |
protected void |
calculateConstraintPenaltiesDotP()
Go through all the constraints and accumulate the penalties from violated constraints
using distanced-squared penalties |
protected void |
calculateConstraintPenaltiesEuclidean()
Go through all the constraints and accumulate the penalties from violated constraints
using distanced-squared penalties |
protected void |
calculateConstraintPenaltiesKL()
Go through all the constraints and accumulate the penalties from violated constraints
using distanced penalties |
protected void |
calculateMaxCannotLinkDistanceAndSimilarity()
Go through the must-link constraints and find the current maximum distance |
protected double[] |
calculateMaxCannotLinkDistances()
Go through the cannot-link constraints and find the current maximum distance |
protected double |
calculateObjectiveFunction()
calculates objective function |
int |
clusterInstance(Instance instance)
Checks if instance has to be normalized and classifies the
instance using the current clustering |
protected boolean |
convergenceCheck(double oldObjective,
double newObjective,
boolean checkOscillation)
checks for convergence of NR iteration |
protected void |
createGlobalCentroids()
Creates the global cluster centroid |
protected void |
DFS_VISIT(int u,
int[] vertexColor)
Recursive subroutine for DFS |
protected void |
DFS()
Main Depth First Search routine |
double |
distancePenaltyInCombinedModel(int instIdx,
int centroidIdx)
Delegate the distance calculation to the method appropriate for the current metric |
double |
distancePenaltyInPottsModel(int instIdx,
int centroidIdx)
Delegate the distance calculation to the method appropriate for the current metric |
protected int |
findBestAssignments()
E-step of the KMeans clustering algorithm -- find best cluster assignments
Returns the number of points moved in this step |
boolean |
getActive()
get the active level of clusterer |
SelectedTag |
getAlgorithm()
Get the KMeans algorithm type. |
MPCKMeansAssigner |
getAssigner()
Set/get the assigner |
double |
getCannotLinkWeight()
Return the cannot link constraint weight |
int[] |
getClusterAssignments()
|
Instances |
getClusterCentroids()
Accessor |
java.util.ArrayList |
getClusters()
Computes the clusters from the cluster assignments, for external access |
java.util.HashMap |
getConstraintsHash()
|
double |
getDefaultPerturb()
Get default perturbation value |
double |
getEta()
Get the initial value of gradient descent eta |
double |
getEtaDecayRate()
Get the initial value of the decay rate of gradient descent eta |
java.util.HashSet[] |
getIndexClusters()
Computes the clusters from the cluster assignments, for external access |
java.util.HashMap |
getInstanceConstraintsHash()
|
Instances |
getInstances()
Return training instances |
boolean |
getIsRandomNeighborhoods()
Get value of m_IsRandomNeighborhoods |
double |
getLogTermWeight()
Get the value of the weight assigned to log term in the objective function |
int |
getMaxBlankIterations()
Get the maximum number of blank iterations |
int |
getMaxIterations()
Get the maximum number of iterations |
Metric |
getMetric()
get the distance metric |
LearnableMetric[] |
getMetrics()
get the array of metrics |
double |
getMustLinkWeight()
Return the must link constraint weight |
int |
getNumClusters()
Return the number of clusters |
double |
getObjFunConvergenceDifference()
Get the minimum value of the objective function difference required for convergence |
java.lang.String[] |
getOptions()
Gets the current option settings for the OptionHandler. |
boolean |
getPhaseTwoRandom()
Return m_PhaseTwoRandom |
int |
getRandomSeed()
Return the random number seed |
boolean |
getSeedable()
Is seeding performed? |
Clusterer |
getThisClusterer()
We always want to implement SemiSupClusterer from a class extending Clusterer. |
static java.lang.Double |
getTimeStamp()
Gets a Double representing the current date and time. |
SelectedTag |
getTrainable()
Is metric learning performed? |
boolean |
getUseCombinedObjectiveFunction()
Is combined objective function being used |
boolean |
getUseMultipleMetrics()
See if individual per-cluster metrics are used |
boolean |
getVerbose()
get the verbosity level of the clusterer |
boolean |
isClassAttributeString()
|
boolean |
isObjFunDecreasing()
Is the objective function decreasing or increasing? |
java.util.Enumeration |
listOptions()
Returns an enumeration of all the available options.. |
protected int |
lookupInstanceCluster(Instance instance)
lookup the instance in the checksum hash |
static void |
main(java.lang.String[] args)
Main method for testing this class. |
protected double[] |
meanOrMode(Instances insts)
Fast version of meanOrMode - streamlined from Instances.meanOrMode for efficiency
Does not check for missing attributes, assumes numeric attributes, assumes Sparse instances |
protected void |
nonActivePairwiseInit()
Initialization routine for non-active algorithm |
static double[] |
normalize(double[] weights)
Normalize an array of double's |
void |
normalize(Instance inst)
Normalizes Instance or SparseInstance |
protected void |
normalizeByWeight(Instance inst)
This function divides every attribute value in an instance by
the instance weight -- useful to find the mean of a cluster in
Euclidean space |
void |
normalizeInstance(Instance inst)
Normalizes the values of a normal Instance in L2 norm |
void |
normalizeSparseInstance(Instance inst)
Normalizes the values of a SparseInstance in L2 norm |
protected double[] |
nrWithLineSearchForAlpha(double[] currAttrWeights,
double[] invUnconstrainedAttrWeights)
Does one NR step, calculates the alpha (using line search) that
does not violate positivity constraint of each attribute weight,
returns new values of attribute weights |
int |
numberOfClusters()
A duplicate function to conform to Clusterer abstract class. |
double |
objectiveFunction()
returns objective function |
void |
printClusters()
Prints clusters |
void |
printIndexClusters()
Outputs the current clustering |
void |
resetClusterer()
Reset all values that have been learned |
void |
resetObjective()
reset the value of the objective function and all of its components |
protected void |
runKMeans()
Actual KMeans function |
boolean |
seedable()
We can have clusterers that don't utilize seeding |
void |
seedClusterer(java.util.HashMap seedHash)
Read the seeds from a hastable, where every key is an instance and every value is:
the cluster assignment of that instance
seedVector vector containing seeds |
void |
setActive(boolean active)
set the active level of the clusterer |
void |
setAlgorithm(SelectedTag algo)
Set the KMeans algorithm. |
void |
setAssigner(MPCKMeansAssigner assigner)
|
void |
setCannotLinkWeight(double w)
Set the cannot link constraint weight |
void |
setDefaultPerturb(double p)
Set default perturbation value |
void |
setEta(double eta)
Set the initial value of gradient descent eta |
void |
setEtaDecayRate(double etaDecayRate)
Set the initial value of the decay rate of gradient descent eta |
void |
setInstances(Instances instances)
Sets training instances |
void |
setIsRandomNeighborhoods(boolean b)
Set value of m_IsRandomNeighborhoods |
void |
setLogTermWeight(double logTermWeight)
Set the value of the weight assigned to log term in the objective function |
void |
setMaxBlankIterations(int maxBlankIterations)
Set the maximum number of blank iterations (those where no points are moved) |
void |
setMaxIterations(int maxIterations)
Set the maximum number of iterations |
void |
setMetric(Metric m)
Set the distance metric |
void |
setMustLinkWeight(double w)
Set the must link constraint weight |
void |
setNumClusters(int n)
Set the number of clusters to generate |
void |
setObjFunConvergenceDifference(double objFunConvergenceDifference)
Set the minimum value of the objective function difference required for convergence |
void |
setOptions(java.lang.String[] options)
Parses a given list of options. |
void |
setPhaseTwoRandom(boolean w)
Set m_PhaseTwoRandom |
void |
setRandomSeed(int s)
Set the random number seed |
void |
setSeedable(boolean seedable)
Turn seeding on and off |
void |
setSeedHash(java.util.HashMap seedhash)
Set the m_SeedHash |
void |
setTrainable(SelectedTag trainable)
Turn metric learning on and off |
void |
setUseCombinedObjectiveFunction(boolean useCombined)
Use combined objective function (if true) or Potts Model (if false) |
void |
setUseMultipleMetrics(boolean useMultipleMetrics)
Turn on/off the use of per-cluster metrics |
void |
setVerbose(boolean verbose)
set the verbosity level of the clusterer |
double |
similarityInCombinedModel(int instIdx,
int centroidIdx)
finds similarity between instance and centroid in Combined Objective Model |
double |
similarityInPottsModel(int instIdx,
int centroidIdx)
finds similarity between instance and centroid in Potts Model |
protected Instance |
sumInstances(Instance inst1,
Instance inst2)
Finds sum of 2 instances (handles sparse and non-sparse) |
java.lang.String |
toString()
return a string describing this clusterer |
void |
trainClusterer(Instances instances)
Train the clusterer using specified parameters |
protected void |
updateClusterAssignments()
Updates the clusterAssignments for all points after clustering. |
protected void |
updateClusterCentroids()
M-step of the KMeans clustering algorithm -- updates cluster centroids |
protected void |
updateMetricWeights()
M-step of the KMeans clustering algorithm -- updates metric
weights. |
protected void |
updateMetricWeightsDotPGD()
M-step of the KMeans clustering algorithm -- updates metric
weights. |
protected void |
updateMetricWeightsEuclidean()
M-step of the KMeans clustering algorithm -- updates metric
weights. |
protected void |
updateMetricWeightsKL()
M-step of the KMeans clustering algorithm -- updates metric
weights. |
protected void |
updateMetricWeightsKLGD()
M-step of the KMeans clustering algorithm -- updates metric
weights. |
protected void |
updateMetricWeightsMahalanobis()
M-step of the KMeans clustering algorithm -- updates metric
weights. |
protected boolean |
updateMultipleMetricWeights()
M-step of the KMeans clustering algorithm -- updates metric
weights for the individual metrics. |
protected boolean |
updateMultipleMetricWeightsDotPGD()
M-step of the KMeans clustering algorithm -- updates metric
weights for the individual metrics. |
protected boolean |
updateMultipleMetricWeightsEuclidean()
M-step of the KMeans clustering algorithm -- updates metric
weights for the individual metrics. |
protected boolean |
updateMultipleMetricWeightsKLGD()
M-step of the KMeans clustering algorithm -- updates metric
weights for the individual metrics. |
protected boolean |
updateMultipleMetricWeightsMahalanobis()
M-step of the KMeans clustering algorithm -- updates metric
weights. |
protected double[] |
updateWeightsUsingNewtonRaphson(double[] currentAttrWeights,
double[] invUnconstrainedAttrWeights)
calculates weights using Newton Raphson, to satisfy the
positivity constraint of each attribute weight, returns learned
attribute weights. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
m_Clusters
protected java.util.ArrayList m_Clusters
- holds the instances in the clusters
m_IndexClusters
protected java.util.HashSet[] m_IndexClusters
- holds the instance indices in the clusters
m_ConstraintsHash
protected java.util.HashMap m_ConstraintsHash
- holds the ([instance pair] -> [type of constraint]) mapping
m_instanceConstraintHash
protected java.util.HashMap m_instanceConstraintHash
m_AdjacencyList
protected java.util.HashSet[] m_AdjacencyList
- adjacency list for random
m_SeedHash
protected java.util.HashSet m_SeedHash
- holds the points involved in the constraints
m_CannotLinkWeight
protected double m_CannotLinkWeight
- weight to be given to each constraint
m_MustLinkWeight
protected double m_MustLinkWeight
- weight to be given to each constraint
m_isClassAttributeString
protected boolean m_isClassAttributeString
- checks to see if class is a string
m_isOfflineMetric
protected boolean m_isOfflineMetric
- is it an offline metric (BarHillelMetric or XingMetric)?
m_MaxCannotLinkDistance
protected double m_MaxCannotLinkDistance
- the maximum distance between cannot-link constraints
m_MaxCannotLinkSimilarity
protected double m_MaxCannotLinkSimilarity
- the min similarity between cannot-link constraints
m_MaxCannotLinkDistances
protected double[] m_MaxCannotLinkDistances
- the maximum distance between cannot-link constraints
m_MaxCannotLinkPoints
public Instance[][] m_MaxCannotLinkPoints
m_verbose
protected boolean m_verbose
- verbose?
m_metric
protected Metric m_metric
- distance Metric
m_useMultipleMetrics
protected boolean m_useMultipleMetrics
- Individual metrics for each cluster can be used
m_metrics
protected LearnableMetric[] m_metrics
m_logTermWeight
protected double m_logTermWeight
- Relative importance of the log-term for the weights in the objective function
m_logTerms
protected double[] m_logTerms
- We will hash log terms to avoid recomputing every time TODO: implement for Euclidean
m_metricBuilt
protected boolean m_metricBuilt
- has the metric has been constructed? a fix for multiple buildClusterer's
m_isSparseInstance
protected boolean m_isSparseInstance
- indicates whether instances are sparse
m_IsRandomNeighborhoods
protected boolean m_IsRandomNeighborhoods
- indicates whether initialization is using random neighborhoods
-- in the case of offline metric
m_objFunDecreasing
protected boolean m_objFunDecreasing
- Is the objective function increasing or decreasing? Depends on type
of metric used: for similarity-based metric, increasing, for distance-based - decreasing
m_Seedable
protected boolean m_Seedable
- Seedable or not (true by default)
TRAINING_NONE
public static final int TRAINING_NONE
- Possible metric training
- See Also:
- Constant Field Values
TRAINING_EXTERNAL
public static final int TRAINING_EXTERNAL
- See Also:
- Constant Field Values
TRAINING_INTERNAL
public static final int TRAINING_INTERNAL
- See Also:
- Constant Field Values
TAGS_TRAINING
public static final Tag[] TAGS_TRAINING
m_Trainable
protected int m_Trainable
m_UseCombinedObjectiveFunction
protected boolean m_UseCombinedObjectiveFunction
- Using Combined Objective function (if true) or Potts Model (if false)
m_PhaseTwoRandom
protected boolean m_PhaseTwoRandom
- Round robin or Random in active Phase Two
m_Iterations
protected int m_Iterations
- keep track of the number of iterations completed before convergence
m_numBlankIterations
protected int m_numBlankIterations
- keep track of the number of iterations when no points were moved
m_maxIterations
protected int m_maxIterations
- the maximum number of iterations
m_maxBlankIterations
protected int m_maxBlankIterations
- the maximum number of iterations with no points moved
ALGORITHM_SIMPLE
public static final int ALGORITHM_SIMPLE
- Define possible algorithms
- See Also:
- Constant Field Values
ALGORITHM_SPHERICAL
public static final int ALGORITHM_SPHERICAL
- See Also:
- Constant Field Values
TAGS_ALGORITHM
public static final Tag[] TAGS_ALGORITHM
m_Algorithm
protected int m_Algorithm
- algorithm, by default spherical
m_ObjFunConvergenceDifference
protected double m_ObjFunConvergenceDifference
- min difference of objective function values for convergence
m_NRConvergenceDifference
protected double m_NRConvergenceDifference
- min difference of NR values for convergence
m_Objective
protected double m_Objective
- value of current objective function
m_OldObjective
protected double m_OldObjective
- value of last objective function
m_objVariance
protected double m_objVariance
- Variables to track components of the objective function
m_objCannotLinks
protected double m_objCannotLinks
m_objMustLinks
protected double m_objMustLinks
m_objNormalizer
protected double m_objNormalizer
m_objVarianceCurrPoint
protected double m_objVarianceCurrPoint
- Variable to track the contribution of the currently considered point
m_objCannotLinksCurrPoint
protected double m_objCannotLinksCurrPoint
m_objMustLinksCurrPoint
protected double m_objMustLinksCurrPoint
m_objNormalizerCurrPoint
protected double m_objNormalizerCurrPoint
m_objVarianceCurrPointBest
protected double m_objVarianceCurrPointBest
m_objCannotLinksCurrPointBest
protected double m_objCannotLinksCurrPointBest
m_objMustLinksCurrPointBest
protected double m_objMustLinksCurrPointBest
m_objNormalizerCurrPointBest
protected double m_objNormalizerCurrPointBest
m_eta
protected double m_eta
- gradient descent parameters
m_currEta
protected double m_currEta
m_etaDecayRate
protected double m_etaDecayRate
m_TotalTrainWithLabels
protected Instances m_TotalTrainWithLabels
- training instances with labels
m_Instances
protected Instances m_Instances
- training instances
m_checksumHash
protected java.util.HashMap m_checksumHash
- A hash where the instance checksums are hashed
m_checksumCoeffs
protected double[] m_checksumCoeffs
m_StartingIndexOfTest
protected int m_StartingIndexOfTest
- test data -- required to make sure that test points are not
selected during active learning
m_NumActive
protected int m_NumActive
- number of pairs to seed with
m_Active
protected boolean m_Active
- active mode?
m_NumClusters
protected int m_NumClusters
- number of clusters to generate, default is -1 to get it from labeled data
m_NumCurrentClusters
protected int m_NumCurrentClusters
- Number of clusters in the process
m_FastMode
protected boolean m_FastMode
- m_FastMode = true => fast computation of meanOrMode in centroid calculation, useful for high-D data sets
m_FastMode = false => usual computation of meanOrMode in centroid calculation
m_ClusterCentroids
protected Instances m_ClusterCentroids
- holds the cluster centroids
m_GlobalCentroid
protected Instance m_GlobalCentroid
- holds the global centroids
m_DefaultPerturb
protected double m_DefaultPerturb
- holds the default perturbation value for randomPerturbInit
m_MergeThreshold
protected double m_MergeThreshold
- holds the default merge threshold for matchMergeStep
m_ClusterAssignments
protected int[] m_ClusterAssignments
- temporary variable holding cluster assignments while iterating
m_SumOfClusterInstances
protected Instance[] m_SumOfClusterInstances
- temporary variable holding cluster sums while iterating
m_RandomSeed
protected int m_RandomSeed
- holds the random Seed, useful for randomPerturbInit
m_RandomNumberGenerator
protected java.util.Random m_RandomNumberGenerator
- holds the random number generator used in various parts of the code
m_Assigner
protected MPCKMeansAssigner m_Assigner
- Define possible assignment strategies
m_NeighborSets
protected java.util.HashSet[] m_NeighborSets
- neighbor list for active learning: points in each cluster neighborhood
m_numNeighborhoods
protected int m_numNeighborhoods
- number of neighborhood sets
MPCKMeans
public MPCKMeans()
MPCKMeans
public MPCKMeans(Metric metric)
getConstraintsHash
public java.util.HashMap getConstraintsHash()
getInstanceConstraintsHash
public java.util.HashMap getInstanceConstraintsHash()
isClassAttributeString
public boolean isClassAttributeString()
objectiveFunction
public double objectiveFunction()
- returns objective function
- Specified by:
objectiveFunction
in interface SemiSupClusterer
getClusterCentroids
public Instances getClusterCentroids()
- Accessor
getClusterAssignments
public int[] getClusterAssignments()
getThisClusterer
public Clusterer getThisClusterer()
- We always want to implement SemiSupClusterer from a class extending Clusterer.
We want to be able to return the underlying parent class.
- Specified by:
getThisClusterer
in interface SemiSupClusterer
- Returns:
- parent Clusterer class
buildClusterer
public void buildClusterer(Instances labeledData,
Instances unlabeledData,
int classIndex,
int numClusters,
int startingIndexOfTest)
throws java.lang.Exception
- Clusters unlabeledData and labeledData (with labels removed),
using labeledData as seeds
- Specified by:
buildClusterer
in interface SemiSupClusterer
- Parameters:
labeledData
- labeled instances to be used as seedsunlabeledData
- unlabeled instancesclassIndex
- attribute index in labeledData which holds class infonumClusters
- number of clustersstartingIndexOfTest
- from where test data starts in unlabeledData, useful if clustering is transductive
- Throws:
java.lang.Exception
- if something goes wrong.
buildClusterer
public void buildClusterer(Instances data,
int num_clusters)
throws java.lang.Exception
- Cluster given instances to form the specified number of clusters.
- Parameters:
data
- instances to be clusterednum_clusters
- number of clusters to create
- Throws:
java.lang.Exception
- if something goes wrong.
buildClusterer
public void buildClusterer(java.util.ArrayList labeledPair,
Instances unlabeledData,
Instances labeledData,
int numClusters,
int startingIndexOfTest)
throws java.lang.Exception
- Clusters unlabeledData and labeledData (with labels removed),
using labeledData as seeds
- Parameters:
unlabeledData
- unlabeled instancesnumClusters
- number of clustersstartingIndexOfTest
- starting index of test set in unlabeled data
- Throws:
java.lang.Exception
- if something goes wrong.
buildClusterer
public void buildClusterer(Instances labeledData,
Instances unlabeledData,
int classIndex,
int numClusters)
throws java.lang.Exception
- Clusters unlabeledData and labeledData (with labels removed),
using labeledData as seeds
- Parameters:
labeledData
- labeled instances to be used as seedsunlabeledData
- unlabeled instancesclassIndex
- attribute index in labeledData which holds class infonumClusters
- number of clusters
- Throws:
java.lang.Exception
- if something goes wrong.
resetClusterer
public void resetClusterer()
throws java.lang.Exception
- Reset all values that have been learned
- Specified by:
resetClusterer
in interface SemiSupClusterer
- Throws:
java.lang.Exception
setIsRandomNeighborhoods
public void setIsRandomNeighborhoods(boolean b)
- Set value of m_IsRandomNeighborhoods
- Parameters:
b
- value
getIsRandomNeighborhoods
public boolean getIsRandomNeighborhoods()
- Get value of m_IsRandomNeighborhoods
- Returns:
- m_IsRandomNeighborhoods
setDefaultPerturb
public void setDefaultPerturb(double p)
- Set default perturbation value
- Parameters:
p
- perturbation fraction
getDefaultPerturb
public double getDefaultPerturb()
- Get default perturbation value
- Returns:
- perturbation fraction
setSeedable
public void setSeedable(boolean seedable)
- Turn seeding on and off
- Parameters:
seedable
- should seeding be done?
setTrainable
public void setTrainable(SelectedTag trainable)
- Turn metric learning on and off
- Parameters:
trainable
- should metric learning be done?
setUseCombinedObjectiveFunction
public void setUseCombinedObjectiveFunction(boolean useCombined)
- Use combined objective function (if true) or Potts Model (if false)
getSeedable
public boolean getSeedable()
- Is seeding performed?
- Returns:
- is seeding being done?
getTrainable
public SelectedTag getTrainable()
- Is metric learning performed?
- Returns:
- is metric learning being done?
getUseCombinedObjectiveFunction
public boolean getUseCombinedObjectiveFunction()
- Is combined objective function being used
- Returns:
- is combined objective function being used
seedable
public boolean seedable()
- We can have clusterers that don't utilize seeding
activePhaseOne
protected int activePhaseOne(int numQueries)
throws java.lang.Exception
- Phase 1 code for active learning
- Throws:
java.lang.Exception
activePhaseTwoRoundRobin
protected void activePhaseTwoRoundRobin(int numQueries)
throws java.lang.Exception
- Phase 2 code for active learning, with round robin
- Throws:
java.lang.Exception
activePhaseTwoRandom
protected void activePhaseTwoRandom(int numQueries)
throws java.lang.Exception
- Phase 2 code for active learning, random
- Throws:
java.lang.Exception
createGlobalCentroids
protected void createGlobalCentroids()
throws java.lang.Exception
- Creates the global cluster centroid
- Throws:
java.lang.Exception
addMLAndCLTransitiveClosure
protected void addMLAndCLTransitiveClosure(int[] indices)
throws java.lang.Exception
- adding other inferred ML and CL links to m_ConstraintsHash, from
m_NeighborSets
- Throws:
java.lang.Exception
DFS
protected void DFS()
throws java.lang.Exception
- Main Depth First Search routine
- Throws:
java.lang.Exception
DFS_VISIT
protected void DFS_VISIT(int u,
int[] vertexColor)
throws java.lang.Exception
- Recursive subroutine for DFS
- Throws:
java.lang.Exception
nonActivePairwiseInit
protected void nonActivePairwiseInit()
throws java.lang.Exception
- Initialization routine for non-active algorithm
- Throws:
java.lang.Exception
askOracle
protected int askOracle(int X,
int Y)
- Query: oracle replies on link
normalizeByWeight
protected void normalizeByWeight(Instance inst)
- This function divides every attribute value in an instance by
the instance weight -- useful to find the mean of a cluster in
Euclidean space
- Parameters:
inst
- Instance passed in for normalization (destructive update)
sumInstances
protected Instance sumInstances(Instance inst1,
Instance inst2)
throws java.lang.Exception
- Finds sum of 2 instances (handles sparse and non-sparse)
- Throws:
java.lang.Exception
updateClusterAssignments
protected void updateClusterAssignments()
throws java.lang.Exception
- Updates the clusterAssignments for all points after clustering.
Map assignments to 0 ... numInstances-1 i.e. from [0 2 2 0 6 6 2]
-> [0 1 1 0 2 2 0]
- Throws:
java.lang.Exception
printIndexClusters
public void printIndexClusters()
throws java.lang.Exception
- Outputs the current clustering
- Throws:
java.lang.Exception
- if something goes wrong
findBestAssignments
protected int findBestAssignments()
throws java.lang.Exception
- E-step of the KMeans clustering algorithm -- find best cluster assignments
Returns the number of points moved in this step
- Throws:
java.lang.Exception
assignInstanceToClusterWithConstraints
public int assignInstanceToClusterWithConstraints(int instIdx)
throws java.lang.Exception
- Classifies the instance using the current clustering, considering constraints
- Returns:
- the number of the assigned cluster as an integer if the
class is enumerated, otherwise the predicted value
- Throws:
java.lang.Exception
- if instance could not be classified
successfully
calculateConstraintPenalties
protected void calculateConstraintPenalties()
throws java.lang.Exception
- Go through all the constraints and accumulate the penalties from violated constraints
- Throws:
java.lang.Exception
calculateConstraintPenaltiesEuclidean
protected void calculateConstraintPenaltiesEuclidean()
throws java.lang.Exception
- Go through all the constraints and accumulate the penalties from violated constraints
using distanced-squared penalties
- Throws:
java.lang.Exception
calculateConstraintPenaltiesKL
protected void calculateConstraintPenaltiesKL()
throws java.lang.Exception
- Go through all the constraints and accumulate the penalties from violated constraints
using distanced penalties
- Throws:
java.lang.Exception
calculateConstraintPenaltiesDotP
protected void calculateConstraintPenaltiesDotP()
throws java.lang.Exception
- Go through all the constraints and accumulate the penalties from violated constraints
using distanced-squared penalties
- Throws:
java.lang.Exception
similarityInPottsModel
public double similarityInPottsModel(int instIdx,
int centroidIdx)
throws java.lang.Exception
- finds similarity between instance and centroid in Potts Model
- Throws:
java.lang.Exception
distancePenaltyInPottsModel
public double distancePenaltyInPottsModel(int instIdx,
int centroidIdx)
throws java.lang.Exception
- Delegate the distance calculation to the method appropriate for the current metric
- Throws:
java.lang.Exception
similarityInCombinedModel
public double similarityInCombinedModel(int instIdx,
int centroidIdx)
throws java.lang.Exception
- finds similarity between instance and centroid in Combined Objective Model
- Throws:
java.lang.Exception
distancePenaltyInCombinedModel
public double distancePenaltyInCombinedModel(int instIdx,
int centroidIdx)
throws java.lang.Exception
- Delegate the distance calculation to the method appropriate for the current metric
- Throws:
java.lang.Exception
updateClusterCentroids
protected void updateClusterCentroids()
throws java.lang.Exception
- M-step of the KMeans clustering algorithm -- updates cluster centroids
- Throws:
java.lang.Exception
updateMetricWeights
protected void updateMetricWeights()
throws java.lang.Exception
- M-step of the KMeans clustering algorithm -- updates metric
weights. Invoked only when m_UseCombinedObjectiveFunction is
true and metric is trainable
- Throws:
java.lang.Exception
updateMetricWeightsEuclidean
protected void updateMetricWeightsEuclidean()
throws java.lang.Exception
- M-step of the KMeans clustering algorithm -- updates metric
weights. Invoked only when m_UseCombinedObjectiveFunction is
true and metric is trainable
- Throws:
java.lang.Exception
updateMetricWeightsKL
protected void updateMetricWeightsKL()
throws java.lang.Exception
- M-step of the KMeans clustering algorithm -- updates metric
weights. Invoked only when m_UseCombinedObjectiveFunction is
true and metric is trainable
- Throws:
java.lang.Exception
updateMetricWeightsKLGD
protected void updateMetricWeightsKLGD()
throws java.lang.Exception
- M-step of the KMeans clustering algorithm -- updates metric
weights. This method uses gradient descent to update weights for
a KL metric
- Throws:
java.lang.Exception
updateMetricWeightsMahalanobis
protected void updateMetricWeightsMahalanobis()
throws java.lang.Exception
- M-step of the KMeans clustering algorithm -- updates metric
weights. Invoked only when m_UseCombinedObjectiveFunction is
true and metric is an instance of Mahalanobis
- Throws:
java.lang.Exception
updateMetricWeightsDotPGD
protected void updateMetricWeightsDotPGD()
throws java.lang.Exception
- M-step of the KMeans clustering algorithm -- updates metric
weights. This method uses gradient descent to update weights for
a KL metric
- Throws:
java.lang.Exception
updateMultipleMetricWeights
protected boolean updateMultipleMetricWeights()
throws java.lang.Exception
- M-step of the KMeans clustering algorithm -- updates metric
weights for the individual metrics. Invoked only when m_UseCombinedObjectiveFunction is
true and metric is trainable
- Throws:
java.lang.Exception
updateMultipleMetricWeightsEuclidean
protected boolean updateMultipleMetricWeightsEuclidean()
throws java.lang.Exception
- M-step of the KMeans clustering algorithm -- updates metric
weights for the individual metrics. Invoked only when m_UseCombinedObjectiveFunction is
true and metric is trainable
- Throws:
java.lang.Exception
updateWeightsUsingNewtonRaphson
protected double[] updateWeightsUsingNewtonRaphson(double[] currentAttrWeights,
double[] invUnconstrainedAttrWeights)
throws java.lang.Exception
- calculates weights using Newton Raphson, to satisfy the
positivity constraint of each attribute weight, returns learned
attribute weights. Note: currentAttrWeights is the inverted version
of the current metric weights.
- Throws:
java.lang.Exception
nrWithLineSearchForAlpha
protected double[] nrWithLineSearchForAlpha(double[] currAttrWeights,
double[] invUnconstrainedAttrWeights)
throws java.lang.Exception
- Does one NR step, calculates the alpha (using line search) that
does not violate positivity constraint of each attribute weight,
returns new values of attribute weights
- Throws:
java.lang.Exception
updateMultipleMetricWeightsKLGD
protected boolean updateMultipleMetricWeightsKLGD()
throws java.lang.Exception
- M-step of the KMeans clustering algorithm -- updates metric
weights for the individual metrics. Invoked only when m_UseCombinedObjectiveFunction is
true and metric is trainable
- Throws:
java.lang.Exception
updateMultipleMetricWeightsMahalanobis
protected boolean updateMultipleMetricWeightsMahalanobis()
throws java.lang.Exception
- M-step of the KMeans clustering algorithm -- updates metric
weights. Invoked only when m_UseCombinedObjectiveFunction is
true and metric is an instance of Mahalanobis
- Returns:
- value true if everything was alright; false if there was miserable failure and clustering needs to be restarted
- Throws:
java.lang.Exception
updateMultipleMetricWeightsDotPGD
protected boolean updateMultipleMetricWeightsDotPGD()
throws java.lang.Exception
- M-step of the KMeans clustering algorithm -- updates metric
weights for the individual metrics. Invoked only when m_UseCombinedObjectiveFunction is
true and metric is trainable
- Throws:
java.lang.Exception
convergenceCheck
protected boolean convergenceCheck(double oldObjective,
double newObjective,
boolean checkOscillation)
throws java.lang.Exception
- checks for convergence of NR iteration
- Throws:
java.lang.Exception
calculateObjectiveFunction
protected double calculateObjectiveFunction()
throws java.lang.Exception
- calculates objective function
- Throws:
java.lang.Exception
buildClusterer
public void buildClusterer(Instances data)
throws java.lang.Exception
- Generates a clusterer. Instances in data have to be
either all sparse or all non-sparse
- Specified by:
buildClusterer
in interface SemiSupClusterer
- Specified by:
buildClusterer
in class Clusterer
- Parameters:
data
- set of instances serving as training data
- Throws:
java.lang.Exception
- if the clusterer has not been
generated successfully
runKMeans
protected void runKMeans()
throws java.lang.Exception
- Actual KMeans function
- Throws:
java.lang.Exception
resetObjective
public void resetObjective()
- reset the value of the objective function and all of its components
calculateMaxCannotLinkDistanceAndSimilarity
protected void calculateMaxCannotLinkDistanceAndSimilarity()
throws java.lang.Exception
- Go through the must-link constraints and find the current maximum distance
- Throws:
java.lang.Exception
calculateMaxCannotLinkDistances
protected double[] calculateMaxCannotLinkDistances()
throws java.lang.Exception
- Go through the cannot-link constraints and find the current maximum distance
- Returns:
- an array of maximum weighted distances. If a single metric is used, maximum distance
is calculated over the entire dataset
- Throws:
java.lang.Exception
bestInstancesForActiveLearning
public int[] bestInstancesForActiveLearning(int numActive)
throws java.lang.Exception
- Dummy: not implemented for MPCKMeans
- Specified by:
bestInstancesForActiveLearning
in interface ActiveLearningClusterer
- Throws:
java.lang.Exception
bestPairsForActiveLearning
public InstancePair[] bestPairsForActiveLearning(int numActive)
throws java.lang.Exception
- Returns the indices of the best numActive instances for active learning
- Specified by:
bestPairsForActiveLearning
in interface ActiveLearningClusterer
- Throws:
java.lang.Exception
clusterInstance
public int clusterInstance(Instance instance)
throws java.lang.Exception
- Checks if instance has to be normalized and classifies the
instance using the current clustering
- Specified by:
clusterInstance
in class Clusterer
- Parameters:
instance
- the instance to be assigned to a cluster
- Returns:
- the number of the assigned cluster as an integer
if the class is enumerated, otherwise the predicted value
- Throws:
java.lang.Exception
- if instance could not be classified
successfully
lookupInstanceCluster
protected int lookupInstanceCluster(Instance instance)
- lookup the instance in the checksum hash
- Parameters:
instance
- instance to be looked up
- Returns:
- the index of the cluster to which the instance was assigned, -1 if the instance has not bee clustered
assignAllInstancesToClusters
public int assignAllInstancesToClusters()
throws java.lang.Exception
- Classifies the instances using the current clustering, moves
must-linked points together (Xing's approach)
- Returns:
- the number of the assigned cluster as an integer
if the class is enumerated, otherwise the predicted value
- Throws:
java.lang.Exception
- if instance could not be classified
successfully
assignInstanceToCluster
public int assignInstanceToCluster(Instance instance)
throws java.lang.Exception
- Classifies the instance using the current clustering, without considering constraints
- Parameters:
instance
- the instance to be assigned to a cluster
- Returns:
- the number of the assigned cluster as an integer
if the class is enumerated, otherwise the predicted value
- Throws:
java.lang.Exception
- if instance could not be classified
successfully
setCannotLinkWeight
public void setCannotLinkWeight(double w)
- Set the cannot link constraint weight
getCannotLinkWeight
public double getCannotLinkWeight()
- Return the cannot link constraint weight
setMustLinkWeight
public void setMustLinkWeight(double w)
- Set the must link constraint weight
getMustLinkWeight
public double getMustLinkWeight()
- Return the must link constraint weight
getPhaseTwoRandom
public boolean getPhaseTwoRandom()
- Return m_PhaseTwoRandom
setPhaseTwoRandom
public void setPhaseTwoRandom(boolean w)
- Set m_PhaseTwoRandom
getNumClusters
public int getNumClusters()
- Return the number of clusters
- Specified by:
getNumClusters
in interface SemiSupClusterer
numberOfClusters
public int numberOfClusters()
- A duplicate function to conform to Clusterer abstract class.
- Specified by:
numberOfClusters
in class Clusterer
- Returns:
- the number of clusters generated for a training dataset.
setSeedHash
public void setSeedHash(java.util.HashMap seedhash)
- Set the m_SeedHash
setRandomSeed
public void setRandomSeed(int s)
- Set the random number seed
- Parameters:
s
- the seed
getRandomSeed
public int getRandomSeed()
- Return the random number seed
setMaxIterations
public void setMaxIterations(int maxIterations)
- Set the maximum number of iterations
getMaxIterations
public int getMaxIterations()
- Get the maximum number of iterations
setMaxBlankIterations
public void setMaxBlankIterations(int maxBlankIterations)
- Set the maximum number of blank iterations (those where no points are moved)
getMaxBlankIterations
public int getMaxBlankIterations()
- Get the maximum number of blank iterations
setObjFunConvergenceDifference
public void setObjFunConvergenceDifference(double objFunConvergenceDifference)
- Set the minimum value of the objective function difference required for convergence
- Parameters:
objFunConvergenceDifference
- the minimum value of the objective function difference required for convergence
getObjFunConvergenceDifference
public double getObjFunConvergenceDifference()
- Get the minimum value of the objective function difference required for convergence
setInstances
public void setInstances(Instances instances)
- Sets training instances
getInstances
public Instances getInstances()
- Return training instances
- Specified by:
getInstances
in interface SemiSupClusterer
- Returns:
- Instances used for clustering, or null
setNumClusters
public void setNumClusters(int n)
- Set the number of clusters to generate
- Specified by:
setNumClusters
in interface SemiSupClusterer
- Parameters:
n
- the number of clusters to generate
isObjFunDecreasing
public boolean isObjFunDecreasing()
- Is the objective function decreasing or increasing?
setMetric
public void setMetric(Metric m)
- Set the distance metric
- Specified by:
setMetric
in interface SemiSupClusterer
getMetric
public Metric getMetric()
- get the distance metric
getMetrics
public LearnableMetric[] getMetrics()
- get the array of metrics
getAssigner
public MPCKMeansAssigner getAssigner()
- Set/get the assigner
setAssigner
public void setAssigner(MPCKMeansAssigner assigner)
setAlgorithm
public void setAlgorithm(SelectedTag algo)
- Set the KMeans algorithm. Values other than
ALGORITHM_SIMPLE or ALGORITHM_SPHERICAL will be ignored
- Parameters:
algo
- algorithm type
getAlgorithm
public SelectedTag getAlgorithm()
- Get the KMeans algorithm type. Will be one of
ALGORITHM_SIMPLE or ALGORITHM_SPHERICAL
seedClusterer
public void seedClusterer(java.util.HashMap seedHash)
- Read the seeds from a hastable, where every key is an instance and every value is:
the cluster assignment of that instance
seedVector vector containing seeds
- Specified by:
seedClusterer
in interface SemiSupClusterer
- Parameters:
seedHash
- HashMap of seeding parameters
printClusters
public void printClusters()
throws java.lang.Exception
- Prints clusters
- Throws:
java.lang.Exception
getClusters
public java.util.ArrayList getClusters()
throws java.lang.Exception
- Computes the clusters from the cluster assignments, for external access
- Specified by:
getClusters
in interface SemiSupClusterer
- Throws:
java.lang.Exception
- if clusters could not be computed successfully
getIndexClusters
public java.util.HashSet[] getIndexClusters()
throws java.lang.Exception
- Computes the clusters from the cluster assignments, for external access
- Throws:
java.lang.Exception
- if clusters could not be computed successfully
listOptions
public java.util.Enumeration listOptions()
- Description copied from interface:
OptionHandler
- Returns an enumeration of all the available options..
- Specified by:
listOptions
in interface OptionHandler
- Returns:
- an enumeration of all available options.
getOptions
public java.lang.String[] getOptions()
- Description copied from interface:
OptionHandler
- Gets the current option settings for the OptionHandler.
- Specified by:
getOptions
in interface OptionHandler
- Returns:
- the list of current option settings as an array of strings
setOptions
public void setOptions(java.lang.String[] options)
throws java.lang.Exception
- Parses a given list of options.
- Specified by:
setOptions
in interface OptionHandler
- Parameters:
options
- the list of options as an array of strings
- Throws:
java.lang.Exception
- if an option is not supported
toString
public java.lang.String toString()
- return a string describing this clusterer
- Returns:
- a description of the clusterer as a string
setActive
public void setActive(boolean active)
- set the active level of the clusterer
- Parameters:
active
-
getActive
public boolean getActive()
- get the active level of clusterer
- Returns:
- active
setVerbose
public void setVerbose(boolean verbose)
- set the verbosity level of the clusterer
- Specified by:
setVerbose
in interface SemiSupClusterer
- Parameters:
verbose
- messages on(true) or off (false)
getVerbose
public boolean getVerbose()
- get the verbosity level of the clusterer
- Returns:
- messages on(true) or off (false)
setUseMultipleMetrics
public void setUseMultipleMetrics(boolean useMultipleMetrics)
- Turn on/off the use of per-cluster metrics
- Parameters:
useMultipleMetrics
- if true, individual metrics will be used for each cluster
getUseMultipleMetrics
public boolean getUseMultipleMetrics()
- See if individual per-cluster metrics are used
- Returns:
- true if individual metrics are used for each cluster
getLogTermWeight
public double getLogTermWeight()
- Get the value of the weight assigned to log term in the objective function
- Returns:
- value of the weight assigned to log term in the objective function
setLogTermWeight
public void setLogTermWeight(double logTermWeight)
- Set the value of the weight assigned to log term in the objective function
- Parameters:
logTermWeight
- weight assigned to log term in the objective function
setEta
public void setEta(double eta)
- Set the initial value of gradient descent eta
getEta
public double getEta()
- Get the initial value of gradient descent eta
setEtaDecayRate
public void setEtaDecayRate(double etaDecayRate)
- Set the initial value of the decay rate of gradient descent eta
getEtaDecayRate
public double getEtaDecayRate()
- Get the initial value of the decay rate of gradient descent eta
trainClusterer
public void trainClusterer(Instances instances)
throws java.lang.Exception
- Train the clusterer using specified parameters
- Specified by:
trainClusterer
in interface SemiSupClusterer
- Parameters:
instances
- Instances to be used for training
- Throws:
java.lang.Exception
normalize
public void normalize(Instance inst)
throws java.lang.Exception
- Normalizes Instance or SparseInstance
- Parameters:
inst
- Instance to be normalized
- Throws:
java.lang.Exception
normalizeInstance
public void normalizeInstance(Instance inst)
throws java.lang.Exception
- Normalizes the values of a normal Instance in L2 norm
- Parameters:
inst
- Instance to be normalized
- Throws:
java.lang.Exception
normalizeSparseInstance
public void normalizeSparseInstance(Instance inst)
throws java.lang.Exception
- Normalizes the values of a SparseInstance in L2 norm
- Parameters:
inst
- SparseInstance to be normalized
- Throws:
java.lang.Exception
normalize
public static double[] normalize(double[] weights)
- Normalize an array of double's
meanOrMode
protected double[] meanOrMode(Instances insts)
- Fast version of meanOrMode - streamlined from Instances.meanOrMode for efficiency
Does not check for missing attributes, assumes numeric attributes, assumes Sparse instances
getTimeStamp
public static java.lang.Double getTimeStamp()
- Gets a Double representing the current date and time.
eg: 1:46pm on 20/5/1999 -> 19990520.1346
- Returns:
- a value of type Double
main
public static void main(java.lang.String[] args)
- Main method for testing this class.