ir.webutils
Class RobotsMetaTagParser
java.lang.Object
javax.swing.text.html.HTMLEditorKit.ParserCallback
ir.webutils.RobotsMetaTagParser
public final class RobotsMetaTagParser
- extends javax.swing.text.html.HTMLEditorKit.ParserCallback
Parser callback that extracts robots META tag information.
Fields inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback |
IMPLIED |
Method Summary |
void |
handleSimpleTag(javax.swing.text.html.HTML.Tag tag,
javax.swing.text.MutableAttributeSet attributes,
int position)
Checks for robots META tags. |
boolean |
index()
Indicates whether the page can be indexed. |
java.util.List<Link> |
parseMetaTags()
Parses the document and returns a list of links that can not be
followed. |
void |
setPage(java.lang.String page)
|
void |
setUrl(java.net.URL url)
|
Methods inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback |
flush, handleComment, handleEndOfLineString, handleEndTag, handleError, handleStartTag, handleText |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
RobotsMetaTagParser
public RobotsMetaTagParser()
RobotsMetaTagParser
public RobotsMetaTagParser(java.net.URL url)
RobotsMetaTagParser
public RobotsMetaTagParser(java.net.URL url,
java.lang.String page)
setPage
public void setPage(java.lang.String page)
setUrl
public void setUrl(java.net.URL url)
handleSimpleTag
public void handleSimpleTag(javax.swing.text.html.HTML.Tag tag,
javax.swing.text.MutableAttributeSet attributes,
int position)
- Checks for robots META tags. If a robots META tag is found,
then the content (if any) is extracted and stored. Note that
only the last robots META tag will be considered.
- Overrides:
handleSimpleTag
in class javax.swing.text.html.HTMLEditorKit.ParserCallback
- Parameters:
tag
- Indicates the type of tag that caused this method to
be called. Only META tags are handled, any other kind of tag
causes this method to do nothing.attributes
- The attributes of this tag. If the tag
defines the "name" attribute with value "robots" (not case
sensitive) then the "content" attribute will be checked, and
stored if it exists.position
- The position of the tag in the document. Not
used.
parseMetaTags
public java.util.List<Link> parseMetaTags()
- Parses the document and returns a list of links that can not be
followed. This method also sets a flag that indicates whether
or not this page can be indexed. Clients can then use
index
to check the value of this flag.
- Returns:
- A
List
of Link
s that should
not be followed from this page.
index
public boolean index()
- Indicates whether the page can be indexed. Call this method
only after
parseMetaTags
has been called.
- Returns:
true
iff. the page can be indexed.