|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectweka.deduping.metrics.InstanceMetric
weka.deduping.metrics.ClassifierInstanceMetric
ClassifierInstanceMetric class employs a classifier that uses values returned by various StringMetric's on individual fields as features and outputs a confidence value that corresponds to similarity between records
Field Summary | |
protected DistributionClassifier |
m_classifier
Classifier that is used for estimating similarity between records |
protected Instances |
m_diffInstances
A temporary dataset that contains diff-instances for training the classifier |
protected StringMetric[][] |
m_fieldMetrics
The actual array of metrics |
protected int |
m_numNegPairs
|
protected int |
m_numPosPairs
The desired number of training pairs |
protected StringMetric[] |
m_stringMetrics
StringMetric prototype that are to be used on each field |
Fields inherited from class weka.deduping.metrics.InstanceMetric |
m_attrIdxs, m_classIndex, m_metrics, m_numActualNegPairs, m_numActualPosPairs |
Constructor Summary | |
ClassifierInstanceMetric()
A default constructor |
Method Summary | |
void |
buildInstanceMetric(int[] attrIdxs)
Generates a new ClassifierInstanceMetric that computes similarity between records using the specified attributes. |
static java.lang.String |
concatStringArray(java.lang.String[] strings)
A little helper to create a single String from an array of Strings |
double |
distance(Instance instance1,
Instance instance2)
Returns distance between two records |
DistributionClassifier |
getClassifier()
Get the classifier |
int |
getNumNegPairs()
Get the number of different-class training pairs |
int |
getNumPosPairs()
Get the number of same-class training pairs |
java.lang.String[] |
getOptions()
Gets the current settings of Greedy Agglomerative Clustering |
PairwiseSelector |
getSelector()
Get the pairwise selector for this metric |
protected java.util.ArrayList |
getStringList(Instances trainData,
Instances testData,
int attrIdx)
An internal method for creating a list of strings for a particular attribute from two sets of instances: trianing and test data |
StringMetric[] |
getStringMetrics()
Get the baseline string metrics |
protected static java.lang.String |
getTimestamp()
Gets a string containing current date and time. |
boolean |
isDistanceBased()
The computation can be either based on distance, or on similarity |
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options |
void |
setClassifier(DistributionClassifier classifier)
Set the classifier |
void |
setNumNegPairs(int numNegPairs)
Set the number of different-class training pairs |
void |
setNumPosPairs(int numPosPairs)
Set the number of same-class training pairs that is desired |
void |
setOptions(java.lang.String[] options)
Parses a given list of options. |
void |
setSelector(PairwiseSelector selector)
Set the pairwise selector for this metric |
void |
setStringMetrics(StringMetric[] metrics)
Set the baseline metric |
double |
similarity(Instance instance1,
Instance instance2)
Returns similarity between two records |
void |
trainInstanceMetric(Instances trainData,
Instances testData)
Create a new metric for operating on specified instances |
Methods inherited from class weka.deduping.metrics.InstanceMetric |
forName, getAttrIdxs, getAttrIdxsWithoutLastClass, getAttrIndxs, getClassIndex, getNumActualNegPairs, getNumActualPosPairs, getNumAttributes, setAttrIdxs, setAttrIdxs, setClassIndex |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
protected DistributionClassifier m_classifier
protected int m_numPosPairs
protected int m_numNegPairs
protected StringMetric[] m_stringMetrics
protected StringMetric[][] m_fieldMetrics
protected Instances m_diffInstances
Constructor Detail |
public ClassifierInstanceMetric()
Method Detail |
public void buildInstanceMetric(int[] attrIdxs) throws java.lang.Exception
buildInstanceMetric
in class InstanceMetric
attrIdxs
- the indeces of attributes that the metric will use
java.lang.Exception
- if the distance metric has not been
generated successfully.public void trainInstanceMetric(Instances trainData, Instances testData) throws java.lang.Exception
trainInstanceMetric
in class InstanceMetric
trainData
- instances for training the metrictestData
- instances that will be used for testing
java.lang.Exception
protected java.util.ArrayList getStringList(Instances trainData, Instances testData, int attrIdx)
trainData
- a dataset of records in the training foldtestData
- a dataset of records in the testing foldattrIdx
- the index of the attribute for which strings are to be collected
public double distance(Instance instance1, Instance instance2) throws java.lang.Exception
distance
in class InstanceMetric
instance1
- First record.instance2
- Second record.
java.lang.Exception
- if distance could not be calculated.public double similarity(Instance instance1, Instance instance2) throws java.lang.Exception
similarity
in class InstanceMetric
instance1
- First instance.instance2
- Second instance.
java.lang.Exception
- if similarity could not be calculated.public boolean isDistanceBased()
isDistanceBased
in class InstanceMetric
public void setClassifier(DistributionClassifier classifier)
classifier
- the classifierpublic DistributionClassifier getClassifier()
public void setStringMetrics(StringMetric[] metrics)
metrics
- string metrics that will used on each string attributepublic StringMetric[] getStringMetrics()
public void setSelector(PairwiseSelector selector)
selector
- a new pairwise selectorpublic PairwiseSelector getSelector()
public void setNumPosPairs(int numPosPairs)
numPosPairs
- the number of same-class training pairs to be
created for training the classifierpublic int getNumPosPairs()
public void setNumNegPairs(int numNegPairs)
numNegPairs
- the number of different-class training pairs
to create for training the classifierpublic int getNumNegPairs()
protected static java.lang.String getTimestamp()
public static java.lang.String concatStringArray(java.lang.String[] strings)
strings
- an array of stringspublic java.util.Enumeration listOptions()
listOptions
in interface OptionHandler
public void setOptions(java.lang.String[] options) throws java.lang.Exception
-M metric options
StringMetric used
-C classifier options
Classifier used
setOptions
in interface OptionHandler
options
- the list of options as an array of strings
java.lang.Exception
- if an option is not supportedpublic java.lang.String[] getOptions()
getOptions
in interface OptionHandler
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |