public final class SafeHTMLPageRetriever extends HTMLPageRetriever
Constructor and Description |
---|
SafeHTMLPageRetriever() |
Modifier and Type | Method and Description |
---|---|
HTMLPage |
getHTMLPage(Link link)
Tries to download the given web page.
|
public HTMLPage getHTMLPage(Link link) throws PathDisallowedException
PathDisallowedException
if access to the page is
prohibited. Also updates Robots Exclusion information based on
the new page.getHTMLPage
in class HTMLPageRetriever
link
- The Link to follow and download.PathDisallowedException
- If url
is
disallowed by a robots.txt file or Robots META tag.