ir.webutils
Class AnchoredLinkExtractor
java.lang.Object
|
+--javax.swing.text.html.HTMLEditorKit.ParserCallback
|
+--ir.webutils.LinkExtractor
|
+--ir.webutils.AnchoredLinkExtractor
- public class AnchoredLinkExtractor
- extends LinkExtractor
Extractor for AnchoredLink's. Modifies the HTML parser
callback routines to also extract and store anchor text for
all links.
Field Summary |
protected java.lang.StringBuffer |
anchorText
Buffer to store anchor text encountered between
an "a" start tag and end tag. |
protected AnchoredLink |
currentLink
The current link being processed |
Fields inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback |
IMPLIED |
Method Summary |
protected void |
addLink(javax.swing.text.MutableAttributeSet attributes,
javax.swing.text.html.HTML.Attribute attr)
Retrieves a link from an attribute set and completes it against
the base URL. |
void |
handleEndTag(javax.swing.text.html.HTML.Tag tag,
int position)
Executed when a closing HTML tag is found in the document. |
void |
handleSimpleTag(javax.swing.text.html.HTML.Tag tag,
javax.swing.text.MutableAttributeSet attributes,
int position)
Executed when an HTML tag that has no closing tag is found in
the document. |
void |
handleStartTag(javax.swing.text.html.HTML.Tag tag,
javax.swing.text.MutableAttributeSet attributes,
int position)
Executed when an opening HTML tag is found in the document. |
void |
handleText(char[] text,
int position)
Executed when a block of text is encountered. |
static void |
main(java.lang.String[] args)
|
Methods inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback |
flush, handleComment, handleEndOfLineString, handleError |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
anchorText
protected java.lang.StringBuffer anchorText
- Buffer to store anchor text encountered between
an "a" start tag and end tag.
currentLink
protected AnchoredLink currentLink
- The current link being processed
AnchoredLinkExtractor
public AnchoredLinkExtractor(HTMLPage page)
- Create an anchored link extractor for the given page
handleText
public void handleText(char[] text,
int position)
- Executed when a block of text is encountered.
If inside anchor tag, store text in anchorText.
- Overrides:
handleText
in class LinkExtractor
- Parameters:
text
- A char
array representation of the
text.position
- The position of the text in the document.
handleStartTag
public void handleStartTag(javax.swing.text.html.HTML.Tag tag,
javax.swing.text.MutableAttributeSet attributes,
int position)
- Executed when an opening HTML tag is found in the document.
Note that this method only handles tags that also have a
closing tag. If "a" tags starts new anchorText buffer.
If already in a "a" tag, store tag info in the anchorText.
- Overrides:
handleStartTag
in class LinkExtractor
- Parameters:
tag
- The tag that caused this function to be executed.attributes
- The attributes of tag
.position
- The start of the tag in the document. If the
tag is implied (filled in by the parser but not actually
present in the document) then position
will
correspond to that of the next encountered tag.
handleEndTag
public void handleEndTag(javax.swing.text.html.HTML.Tag tag,
int position)
- Executed when a closing HTML tag is found in the document.
Note that the parser may add "implied" closing tags. For
example, the default parser adds closing <p> tags.
If end of "a" tag then add the accumulated anchorText to
the current link (the last one added to links).
If already in a "a" tag, store tag info in the anchorText.
- Overrides:
handleEndTag
in class LinkExtractor
- Parameters:
tag
- The tag found.position
- The position of the tag in the document.
handleSimpleTag
public void handleSimpleTag(javax.swing.text.html.HTML.Tag tag,
javax.swing.text.MutableAttributeSet attributes,
int position)
- Executed when an HTML tag that has no closing tag is found in
the document.
If already in a "a" tag, store tag info in the anchorText.
- Overrides:
handleSimpleTag
in class LinkExtractor
- Parameters:
tag
- The tag that caused this function to be executed.attributes
- The attributes of tag
.position
- The start of the tag in the document. If the
tag is implied (filled in by the parser but not actually
present in the document) then position
will
correspond to that of the next encountered tag.
addLink
protected void addLink(javax.swing.text.MutableAttributeSet attributes,
javax.swing.text.html.HTML.Attribute attr)
- Retrieves a link from an attribute set and completes it against
the base URL. This version creates AnchoredLink's
- Overrides:
addLink
in class LinkExtractor
- Parameters:
attributes
- The attribute set.attr
- The attribute that should be treated as a URL. For
example, attr
should be
HTML.Attribute.HREF
if attributes
is
from an anchor tag.
main
public static void main(java.lang.String[] args)
throws java.lang.Exception