ir.webutils
Class SiteSpider

java.lang.Object
  |
  +--ir.webutils.Spider
        |
        +--ir.webutils.SiteSpider

public class SiteSpider
extends Spider

A spider that limits itself to a given site.


Fields inherited from class ir.webutils.Spider
count, linksToVisit, maxCount, saveDir, slow, visited, webpr
 
Constructor Summary
SiteSpider()
           
 
Method Summary
 java.util.List getNewLinks(HTMLPage page)
          Gets links from the given page that are on the same host as the page.
static void main(java.lang.String[] args)
          Spider the web according to the following command options, but stay within the given site (same URL host).
 
Methods inherited from class ir.webutils.Spider
doCrawl, go, handleCCommandLineOption, handleDCommandLineOption, handleSafeCommandLineOption, handleSlowCommandLineOption, handleUCommandLineOption, linkToHTMLPage, processArgs, processPage
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SiteSpider

public SiteSpider()
Method Detail

getNewLinks

public java.util.List getNewLinks(HTMLPage page)
Gets links from the given page that are on the same host as the page.
Overrides:
getNewLinks in class Spider
Returns:
A list of links on page that have the same host as url.

main

public static void main(java.lang.String[] args)
Spider the web according to the following command options, but stay within the given site (same URL host).