ir.webutils
Class AnchoredSiteSpider

java.lang.Object
  |
  +--ir.webutils.Spider
        |
        +--ir.webutils.AnchoredSpider
              |
              +--ir.webutils.AnchoredSiteSpider

public class AnchoredSiteSpider
extends AnchoredSpider

An anchored spider that limits itself to a given site.


Fields inherited from class ir.webutils.AnchoredSpider
urlMap
 
Fields inherited from class ir.webutils.Spider
count, linksToVisit, maxCount, saveDir, slow, visited, webpr
 
Constructor Summary
AnchoredSiteSpider()
           
 
Method Summary
 java.util.List getNewLinks(HTMLPage page)
          Gets links from the given page that are on the same host as the page.
static void main(java.lang.String[] args)
          Spider the web according to the following command options, but stay within the given site (same URL host) and include anchor text of links to page.
 
Methods inherited from class ir.webutils.AnchoredSpider
addAnchorText, doCrawl, handleUCommandLineOption, processPage
 
Methods inherited from class ir.webutils.Spider
go, handleCCommandLineOption, handleDCommandLineOption, handleSafeCommandLineOption, handleSlowCommandLineOption, linkToHTMLPage, processArgs
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AnchoredSiteSpider

public AnchoredSiteSpider()
Method Detail

getNewLinks

public java.util.List getNewLinks(HTMLPage page)
Gets links from the given page that are on the same host as the page.
Overrides:
getNewLinks in class Spider
Returns:
A list of links on page that have the same host as url.

main

public static void main(java.lang.String[] args)
Spider the web according to the following command options, but stay within the given site (same URL host) and include anchor text of links to page.