ir.webutils
Class RobotExclusionSet
java.lang.Object
java.util.AbstractCollection<E>
java.util.AbstractSet<java.lang.String>
ir.webutils.RobotExclusionSet
- All Implemented Interfaces:
- java.lang.Iterable<java.lang.String>, java.util.Collection<java.lang.String>, java.util.Set<java.lang.String>
public class RobotExclusionSet
- extends java.util.AbstractSet<java.lang.String>
RobotExclusionSet provides support for the Robots Exclusion
Protocol. This class provides the ability to parse a robots.txt
file and to check files to make sure that access to them has not
been disallowed by the robots.txt file. This class can also be
used to exclude files linked to on a page that specifies NOFOLLOW
in its Robots META tag.
Constructor Summary |
RobotExclusionSet()
Constructs an empty set. |
RobotExclusionSet(java.lang.String site)
Constructs a set containing the paths in the robots.txt file
for this site. |
Method Summary |
boolean |
add(java.lang.String o)
|
boolean |
contains(java.lang.String path)
Checks to see if a path is prohibited by this set. |
java.util.Iterator<java.lang.String> |
iterator()
|
static void |
main(java.lang.String[] args)
For testing only. |
int |
size()
|
Methods inherited from class java.util.AbstractSet |
equals, hashCode, removeAll |
Methods inherited from class java.util.AbstractCollection |
addAll, clear, contains, containsAll, isEmpty, remove, retainAll, toArray, toArray, toString |
Methods inherited from class java.lang.Object |
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Methods inherited from interface java.util.Set |
addAll, clear, contains, containsAll, isEmpty, remove, retainAll, toArray, toArray |
RobotExclusionSet
public RobotExclusionSet()
- Constructs an empty set.
RobotExclusionSet
public RobotExclusionSet(java.lang.String site)
- Constructs a set containing the paths in the robots.txt file
for this site. The robots.txt
file should conform to the Robots Exclusion Protocol
specification, available at
http://www.robotstxt.org/wc/norobots.htmquerycount.
- Parameters:
site
- The name of the site
size
public int size()
- Specified by:
size
in interface java.util.Collection<java.lang.String>
- Specified by:
size
in interface java.util.Set<java.lang.String>
- Specified by:
size
in class java.util.AbstractCollection<java.lang.String>
add
public boolean add(java.lang.String o)
- Specified by:
add
in interface java.util.Collection<java.lang.String>
- Specified by:
add
in interface java.util.Set<java.lang.String>
- Overrides:
add
in class java.util.AbstractCollection<java.lang.String>
iterator
public java.util.Iterator<java.lang.String> iterator()
- Specified by:
iterator
in interface java.lang.Iterable<java.lang.String>
- Specified by:
iterator
in interface java.util.Collection<java.lang.String>
- Specified by:
iterator
in interface java.util.Set<java.lang.String>
- Specified by:
iterator
in class java.util.AbstractCollection<java.lang.String>
contains
public boolean contains(java.lang.String path)
- Checks to see if a path is prohibited by this set. A path is
prohibited if it starts with an entry in this set.
- Parameters:
path
- String
object representing the path.
- Returns:
true
iff. o
is a
String
object, o
is not
null
, and for each element e in this set
!o.startsWith(e)
.
main
public static void main(java.lang.String[] args)
- For testing only. Parses robosts.txt file for a particular site