|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectweka.deduping.metrics.InstanceMetric
weka.deduping.metrics.SumInstanceMetric
SumInstanceMetric class simply adds values returned by StringMetrics on individual fields
Field Summary | |
protected StringMetric |
m_metric
|
protected int |
m_minCommonTokens
We may require objects to have a minimum number of common tokens for them to be considered for distance computation |
protected int |
m_numNegPairs
|
protected int |
m_numPosPairs
The number of positive pairs desired for training |
StringMetric[] |
m_stringMetrics
An array of StringMetrics that are to be used on each attribute |
Fields inherited from class weka.deduping.metrics.InstanceMetric |
m_attrIdxs, m_classIndex, m_metrics, m_numActualNegPairs, m_numActualPosPairs |
Constructor Summary | |
SumInstanceMetric()
A default constructor |
Method Summary | |
void |
buildInstanceMetric(int[] attrIdxs)
Generates a new SumInstanceMetric based on specified attributes. |
static java.lang.String |
concatStringArray(java.lang.String[] strings)
A little helper to create a single String from an array of Strings |
double |
distance(Instance instance1,
Instance instance2)
Returns distance between two instances without using the weights. |
StringMetric |
getMetric()
Get the baseline metric |
int |
getMinCommonTokens()
Get the minimum number of common tokens that is required from objects to be considered for distance computation |
int |
getNumNegPairs()
Get the number of different-class training pairs |
int |
getNumPosPairs()
Get the number of same-class training pairs |
java.lang.String[] |
getOptions()
Gets the current settings of Greedy Agglomerative Clustering |
PairwiseSelector |
getSelector()
Get the pairwise selector for this metric |
protected java.util.ArrayList |
getStringList(Instances trainData,
Instances testData,
int attrIdx)
An internal method for creating a list of strings for a particular attribute from two sets of instances: trianing and test data |
protected static java.lang.String |
getTimestamp()
Gets a string containing current date and time. |
boolean |
isDistanceBased()
The computation of a metric can be either based on distance, or on similarity |
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options |
static int |
numCommonTokens(java.lang.String s1,
java.lang.String s2)
return the number of tokens that two strings have in commmon |
void |
setMetric(StringMetric metric)
Set the baseline metric |
void |
setMinCommonTokens(int minCommonTokens)
Set the minimum number of common tokens that is required from objects to be considered for distance computation |
void |
setNumNegPairs(int numNegPairs)
Set the number of different-class training pairs |
void |
setNumPosPairs(int numPosPairs)
Set the number of same-class training pairs |
void |
setOptions(java.lang.String[] options)
Parses a given list of options. |
void |
setSelector(PairwiseSelector selector)
Set the pairwise selector for this metric |
double |
similarity(Instance instance1,
Instance instance2)
Returns similarity between two instances without using the weights. |
void |
trainInstanceMetric(Instances trainData,
Instances testData)
Create a new metric for operating on specified instances |
Methods inherited from class weka.deduping.metrics.InstanceMetric |
forName, getAttrIdxs, getAttrIdxsWithoutLastClass, getAttrIndxs, getClassIndex, getNumActualNegPairs, getNumActualPosPairs, getNumAttributes, setAttrIdxs, setAttrIdxs, setClassIndex |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public StringMetric[] m_stringMetrics
protected StringMetric m_metric
protected int m_numPosPairs
protected int m_numNegPairs
protected int m_minCommonTokens
Constructor Detail |
public SumInstanceMetric()
Method Detail |
public void buildInstanceMetric(int[] attrIdxs) throws java.lang.Exception
buildInstanceMetric
in class InstanceMetric
java.lang.Exception
- if the distance metric has not been
generated successfully.public void trainInstanceMetric(Instances trainData, Instances testData) throws java.lang.Exception
trainInstanceMetric
in class InstanceMetric
trainData
- instances that the metric will be trained ontestData
- instances that the metric will be used on
java.lang.Exception
protected java.util.ArrayList getStringList(Instances trainData, Instances testData, int attrIdx)
public double distance(Instance instance1, Instance instance2) throws java.lang.Exception
distance
in class InstanceMetric
instance1
- First instance.instance2
- Second instance.
java.lang.Exception
- if similarity could not be estimated.public double similarity(Instance instance1, Instance instance2) throws java.lang.Exception
similarity
in class InstanceMetric
instance1
- First instance.instance2
- Second instance.
java.lang.Exception
- if similarity could not be estimated.public boolean isDistanceBased()
isDistanceBased
in class InstanceMetric
public void setMetric(StringMetric metric)
metric
- the string metric to be used as the baseline on each string attributepublic StringMetric getMetric()
public void setSelector(PairwiseSelector selector)
selector
- a new pairwise selectorpublic PairwiseSelector getSelector()
public void setNumPosPairs(int numPosPairs)
numPosPairs
- the number of same-class training pairs to create for trainingpublic int getNumPosPairs()
public void setNumNegPairs(int numNegPairs)
numNegPairs
- the number of different-class training pairs to create for trainingpublic int getNumNegPairs()
public void setMinCommonTokens(int minCommonTokens)
minCommonTokens
- the minimum number of tokens in common that is required
from objects to be considered for distance computationpublic int getMinCommonTokens()
protected static java.lang.String getTimestamp()
public static java.lang.String concatStringArray(java.lang.String[] strings)
strings
- an array of stringspublic static int numCommonTokens(java.lang.String s1, java.lang.String s2)
public java.util.Enumeration listOptions()
listOptions
in interface OptionHandler
public void setOptions(java.lang.String[] options) throws java.lang.Exception
-M metric options
StringMetric used
-C classifier options
Classifier used
setOptions
in interface OptionHandler
options
- the list of options as an array of strings
java.lang.Exception
- if an option is not supportedpublic java.lang.String[] getOptions()
getOptions
in interface OptionHandler
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |