|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object ir.webutils.HTMLPageRetriever ir.webutils.SafeHTMLPageRetriever
public final class SafeHTMLPageRetriever
Keeps track of Robot Exclusion information. Clients can use this class to ensure that they do not access pages prohibited either by the Robots Exclusion Protocol or Robots META tags.
Constructor Summary | |
---|---|
SafeHTMLPageRetriever()
|
Method Summary | |
---|---|
HTMLPage |
getHTMLPage(Link link)
Tries to download the given web page. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public SafeHTMLPageRetriever()
Method Detail |
---|
public HTMLPage getHTMLPage(Link link) throws PathDisallowedException
PathDisallowedException
if access to the page is
prohibited. Also updates Robots Exclusion information based on
the new page.
getHTMLPage
in class HTMLPageRetriever
link
- The Link to follow and download.
PathDisallowedException
- If url
is
disallowed by a robots.txt file or Robots META tag.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |