Class Summary |
AnchoredDirectorySpider |
Anchored spider that limits itself to the directory it started in. |
AnchoredLink |
Link with included anchor text |
AnchoredLinkExtractor |
Extractor for AnchoredLink's. |
AnchoredSiteSpider |
An anchored spider that limits itself to a given site. |
AnchoredSpider |
|
DirectorySpider |
Spider that limits itself to the directory it started in. |
HTMLPage |
HTMLPage is a representation of information about a web
page. |
HTMLPageRetriever |
HTMLPageRetriever allows clients to download web pages from URLs. |
HTMLParserMaker |
HTMLParserMaker allows clients to retrieve an
HTMLEditorKit.Parser instance. |
Link |
Link is a class that contains a URL. |
LinkExtractor |
LinkExtractor defines a callback that extracts the links from an
HTML document and provides functionality to parse a document. |
RobotExclusionSet |
RobotExclusionSet provides support for the Robots Exclusion
Protocol. |
RobotsMetaTagParser |
Parser callback that extracts robots META tag information. |
SafeHTMLPage |
SafeHTMLPage is an immutable representation of information about a
web page that includes information about whether or not this page
can be indexed. |
SafeHTMLPageRetriever |
Keeps track of Robot Exclusion information. |
SiteSpider |
A spider that limits itself to a given site. |
Spider |
Spider defines a framework for writing a web crawler. |
URLChecker |
URLChecker.java
trys to clean up some URLs that do not conform to the standard and cause confusion. |
WebPage |
WebPage is a static utility class that provides operations for
downloading web pages. |
WebPageViewer |
WebPageViewer contains utilities to download and display HTML
pages. |