ir.webutils
Class HTMLPage

java.lang.Object
  |
  +--ir.webutils.HTMLPage
Direct Known Subclasses:
SafeHTMLPage

public class HTMLPage
extends java.lang.Object

HTMLPage is a representation of information about a web page.


Field Summary
protected  java.lang.String absoluteText
          Copy of the text with relative links replaced by absolute ones
protected  Link link
          The original link to this page
protected  java.util.List outLinks
          The links on this page
protected  java.lang.String text
          The text of the page
 
Constructor Summary
HTMLPage(Link link, java.lang.String text)
          Constructs an HTMLPage with the given link and text.
 
Method Summary
 boolean empty()
          Returns true if the page is empty or a 404 error.
 java.lang.String getAbsoluteText()
          Get the absolute link version of this page
 Link getLink()
          Returns the Link object that was used to access this page.
 java.util.List getOutLinks()
          Get the list of out links from this page.
 java.lang.String getText()
          Returns the full text of this page.
 boolean indexAllowed()
          Clients should always call this method before indexing an HTML page if they want to obey the "NOINDEX" directive in the Robots META tag.
 void setAbsoluteText(java.lang.String text)
          Set the absolute link version of this page
 void setOutLinks(java.util.List links)
          Set of the outLinks for this page to given list
 void writeAbsolute(java.io.File dir, java.lang.String name)
          Writes web page to a file with absolute links and a comment with the original URL.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

link

protected final Link link
The original link to this page

text

protected final java.lang.String text
The text of the page

outLinks

protected java.util.List outLinks
The links on this page

absoluteText

protected java.lang.String absoluteText
Copy of the text with relative links replaced by absolute ones
Constructor Detail

HTMLPage

public HTMLPage(Link link,
                java.lang.String text)
Constructs an HTMLPage with the given link and text.
Parameters:
link - Link object to the given page.
text - The text of the page.
Method Detail

getText

public java.lang.String getText()
Returns the full text of this page. None of the HTML is stripped out.
Returns:
The text of the this page.

getLink

public Link getLink()
Returns the Link object that was used to access this page.
Returns:
The Link object that was used to access this page.

setOutLinks

public void setOutLinks(java.util.List links)
Set of the outLinks for this page to given list

getOutLinks

public java.util.List getOutLinks()
Get the list of out links from this page.

setAbsoluteText

public void setAbsoluteText(java.lang.String text)
Set the absolute link version of this page

getAbsoluteText

public java.lang.String getAbsoluteText()
Get the absolute link version of this page

indexAllowed

public boolean indexAllowed()
Clients should always call this method before indexing an HTML page if they want to obey the "NOINDEX" directive in the Robots META tag. Always returns true in default implementation.
Returns:
true iff. the page can be indexed. Always returns true in the default implementation.

empty

public boolean empty()
Returns true if the page is empty or a 404 error.

writeAbsolute

public void writeAbsolute(java.io.File dir,
                          java.lang.String name)
Writes web page to a file with absolute links and a comment with the original URL.
Parameters:
dir - The directory to store the file in.
name - The name of the file.