|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object ir.webutils.Spider ir.webutils.BeamSearchSpider ir.webutils.BeamSearchSiteSpider
public class BeamSearchSiteSpider
A BeamSearchSpider that limits itself to a given site (web host).
Field Summary |
---|
Fields inherited from class ir.webutils.BeamSearchSpider |
---|
beamSize, goal, goalPage, heuristic |
Fields inherited from class ir.webutils.Spider |
---|
count, linksToVisit, maxCount, retriever, saveDir, slow, visited |
Constructor Summary | |
---|---|
BeamSearchSiteSpider()
|
Method Summary | |
---|---|
java.util.List<Link> |
getNewLinks(HTMLPage page)
Gets links from the given page that are on the same host as the page. |
static void |
main(java.lang.String[] args)
Search the web using beam search according to the following command options, but stay within the initial host site. |
Methods inherited from class ir.webutils.BeamSearchSpider |
---|
constructLinkHeuristic, doCrawl, go, handleBCommandLineOption, handleHCommandLineOption, handleUCommandLineOption, handleWCommandLineOption, processArgs, scoreLinks |
Methods inherited from class ir.webutils.Spider |
---|
handleCCommandLineOption, handleDCommandLineOption, handleSafeCommandLineOption, handleSlowCommandLineOption, indexPage, linkToHTMLPage |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public BeamSearchSiteSpider()
Method Detail |
---|
public java.util.List<Link> getNewLinks(HTMLPage page)
getNewLinks
in class BeamSearchSpider
page
- The current page.
page
that have the same
host as url
.public static void main(java.lang.String[] args)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |