Package ir.webutils

Class Summary
AnchoredDirectorySpider Anchored spider that limits itself to the directory it started in.
AnchoredLink Link with included anchor text
AnchoredLinkExtractor Extractor for AnchoredLink's.
AnchoredSiteSpider An anchored spider that limits itself to a given site.
AnchoredSpider  
DirectorySpider Spider that limits itself to the directory it started in.
HTMLPage HTMLPage is a representation of information about a web page.
HTMLPageRetriever HTMLPageRetriever allows clients to download web pages from URLs.
HTMLParserMaker HTMLParserMaker allows clients to retrieve an HTMLEditorKit.Parser instance.
Link Link is a class that contains a URL.
LinkExtractor LinkExtractor defines a callback that extracts the links from an HTML document and provides functionality to parse a document.
RobotExclusionSet RobotExclusionSet provides support for the Robots Exclusion Protocol.
RobotsMetaTagParser Parser callback that extracts robots META tag information.
SafeHTMLPage SafeHTMLPage is an immutable representation of information about a web page that includes information about whether or not this page can be indexed.
SafeHTMLPageRetriever Keeps track of Robot Exclusion information.
SiteSpider A spider that limits itself to a given site.
Spider Spider defines a framework for writing a web crawler.
URLChecker URLChecker.java trys to clean up some URLs that do not conform to the standard and cause confusion.
WebPage WebPage is a static utility class that provides operations for downloading web pages.
WebPageViewer WebPageViewer contains utilities to download and display HTML pages.
 

Exception Summary
PathDisallowedException PathDisallowedException.java Thrown to indicate that a client program tried to access a path that was disallowed by either a robots.txt file or a robots META tag.