|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectweka.deduping.metrics.StringMetric
weka.deduping.metrics.AffineMetric
A measure of distance between two strings based on affine distance. See D. Gusfield, "Algorithms on Strings, Trees and Sequences", Cambridge University Press, 1997.
Field Summary | |
static int |
CONVERSION_EXPONENTIAL
|
static int |
CONVERSION_LAPLACIAN
We can have different ways of converting from distance to similarity |
static int |
CONVERSION_UNIT
|
protected int |
m_conversionType
The method of converting, by default laplacian |
protected double |
m_gapExtendCost
The cost of continuing a gap |
protected double |
m_gapStartCost
The cost of opening a gap |
protected double |
m_matchCost
The cost of matching two characters |
protected boolean |
m_normalized
Should the distance be normalized by the lengths of the strings? |
protected double |
m_subCost
The cost of a substituting two characters |
static Tag[] |
TAGS_CONVERSION
|
Constructor Summary | |
AffineMetric()
A default constructor that assigns the name of this distance |
Method Summary | |
java.lang.Object |
clone()
Create a copy of this metric |
double |
distance(java.lang.String string1,
java.lang.String string2)
Obtain the distance between two strings |
double |
getGapExtendCost()
Get the gap extension cost |
double |
getGapStartCost()
Get the gap opening cost |
double |
getMatchCost()
Get the match cost |
boolean |
getNormalized()
Get whether the distance is normalized by the sum of the string's lengths |
java.lang.String[] |
getOptions()
Gets the current settings of WeightedDotP. |
double |
getSubCost()
Get the substitution cost |
boolean |
isDataDependent()
A metric can be data-dependent (e.g. |
boolean |
isDistanceBased()
The computation of a metric can be either based on distance, or on similarity |
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options. |
void |
setGapExtendCost(double gapExtendCost)
Set the gap extension cost |
void |
setGapStartCost(double gapStartCost)
Set the gap opening cost |
void |
setMatchCost(double matchCost)
Set the match cost |
void |
setNormalized(boolean normalized)
Set the distance to be normalized by the sum of the string's lengths |
void |
setOptions(java.lang.String[] options)
Parses a given list of options. |
void |
setSubCost(double subCost)
Set the substitution cost |
double |
similarity(java.lang.String string1,
java.lang.String string2)
Returns a similarity estimate between two strings. |
Methods inherited from class weka.deduping.metrics.StringMetric |
forName |
Methods inherited from class java.lang.Object |
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
protected double m_matchCost
protected double m_subCost
protected double m_gapStartCost
protected double m_gapExtendCost
protected boolean m_normalized
public static final int CONVERSION_LAPLACIAN
public static final int CONVERSION_UNIT
public static final int CONVERSION_EXPONENTIAL
public static final Tag[] TAGS_CONVERSION
protected int m_conversionType
Constructor Detail |
public AffineMetric()
Method Detail |
public boolean isDataDependent()
public double distance(java.lang.String string1, java.lang.String string2) throws java.lang.Exception
distance
in class StringMetric
string1
- first stringstring2
- second string
java.lang.Exception
public boolean isDistanceBased()
isDistanceBased
in class StringMetric
public double similarity(java.lang.String string1, java.lang.String string2) throws java.lang.Exception
similarity
in class StringMetric
string1
- First string.string2
- Second string.
java.lang.Exception
- if similarity could not be estimated.public void setMatchCost(double matchCost)
matchCost
- the cost of finding a matching pair of characterspublic double getMatchCost()
public void setSubCost(double subCost)
subCost
- the cost of substituting one character for anotherpublic double getSubCost()
public void setGapStartCost(double gapStartCost)
gapStartCost
- the cost of opening a gappublic double getGapStartCost()
public void setGapExtendCost(double gapExtendCost)
gapExtendCost
- the cost of extending a gappublic double getGapExtendCost()
public void setNormalized(boolean normalized)
normalized
- if true, distance is normalized by the sum of string's lengthspublic boolean getNormalized()
public java.lang.Object clone()
clone
in class StringMetric
public java.lang.String[] getOptions()
getOptions
in interface OptionHandler
public void setOptions(java.lang.String[] options) throws java.lang.Exception
-N normalize by length -m matchCost -s subCost -g gapStartCost -e gapExtendCost
setOptions
in interface OptionHandler
options
- the list of options as an array of strings
java.lang.Exception
- if an option is not supportedpublic java.util.Enumeration listOptions()
listOptions
in interface OptionHandler
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |