|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--ir.webutils.Spider | +--ir.webutils.AnchoredSpider | +--ir.webutils.AnchoredSiteSpider
An anchored spider that limits itself to a given site.
Fields inherited from class ir.webutils.AnchoredSpider |
urlMap |
Fields inherited from class ir.webutils.Spider |
count, linksToVisit, maxCount, saveDir, slow, visited, webpr |
Constructor Summary | |
AnchoredSiteSpider()
|
Method Summary | |
java.util.List |
getNewLinks(HTMLPage page)
Gets links from the given page that are on the same host as the page. |
static void |
main(java.lang.String[] args)
Spider the web according to the following command options, but stay within the given site (same URL host) and include anchor text of links to page. |
Methods inherited from class ir.webutils.AnchoredSpider |
addAnchorText, doCrawl, handleUCommandLineOption, processPage |
Methods inherited from class ir.webutils.Spider |
go, handleCCommandLineOption, handleDCommandLineOption, handleSafeCommandLineOption, handleSlowCommandLineOption, linkToHTMLPage, processArgs |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public AnchoredSiteSpider()
Method Detail |
public java.util.List getNewLinks(HTMLPage page)
getNewLinks
in class Spider
page
that have the same
host as url
.public static void main(java.lang.String[] args)
|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |