ir.webutils
Class ScoredAnchoredLinkExtractor
java.lang.Object
javax.swing.text.html.HTMLEditorKit.ParserCallback
ir.webutils.LinkExtractor
ir.webutils.AnchoredLinkExtractor
ir.webutils.ScoredAnchoredLinkExtractor
public class ScoredAnchoredLinkExtractor
- extends AnchoredLinkExtractor
An AnchoredLinkExtractor that extracts ScoredAnchoredLink's that
can be scored and used in heuristic web search.
Fields inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback |
IMPLIED |
Method Summary |
protected void |
addLink(javax.swing.text.MutableAttributeSet attributes,
javax.swing.text.html.HTML.Attribute attr)
Retrieves a link from an attribute set and completes it against
the base URL. |
void |
handleEndTag(javax.swing.text.html.HTML.Tag tag,
int position)
Executed when a closing HTML tag is found in the document. |
void |
handleStartTag(javax.swing.text.html.HTML.Tag tag,
javax.swing.text.MutableAttributeSet attributes,
int position)
Executed when an opening HTML tag is found in the document. |
Methods inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback |
flush, handleComment, handleEndOfLineString, handleError |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ScoredAnchoredLinkExtractor
public ScoredAnchoredLinkExtractor(HTMLPage page)
- Create an ScoredAnchoredLink extractor for the given page
handleStartTag
public void handleStartTag(javax.swing.text.html.HTML.Tag tag,
javax.swing.text.MutableAttributeSet attributes,
int position)
- Executed when an opening HTML tag is found in the document.
Note that this method only handles tags that also have a
closing tag. If "a" tags starts new anchorText buffer.
If already in a "a" tag, store tag info in the anchorText.
- Overrides:
handleStartTag
in class AnchoredLinkExtractor
- Parameters:
tag
- The tag that caused this function to be executed.attributes
- The attributes of tag
.position
- The start of the tag in the document. If the
tag is implied (filled in by the parser but not actually
present in the document) then position
will
correspond to that of the next encountered tag.
handleEndTag
public void handleEndTag(javax.swing.text.html.HTML.Tag tag,
int position)
- Executed when a closing HTML tag is found in the document.
Note that the parser may add "implied" closing tags. For
example, the default parser adds closing <p> tags.
If end of "a" tag then add the accumulated anchorText to
the current link (the last one added to links).
If already in a "a" tag, store tag info in the anchorText.
- Overrides:
handleEndTag
in class AnchoredLinkExtractor
- Parameters:
tag
- The tag found.position
- The position of the tag in the document.
addLink
protected void addLink(javax.swing.text.MutableAttributeSet attributes,
javax.swing.text.html.HTML.Attribute attr)
- Retrieves a link from an attribute set and completes it against
the base URL. This version creates ScoredAnchoredLink's
- Overrides:
addLink
in class AnchoredLinkExtractor
- Parameters:
attributes
- The attribute set.attr
- The attribute that should be treated as a URL. For
example, attr
should be
HTML.Attribute.HREF
if attributes
is
from an anchor tag.