- add(HashMapVector) - Method in class ir.vsr.HashMapVector
-
Destructively add the given vector to the current vector
- add(String) - Method in class ir.webutils.RobotExclusionSet
-
- addBad(DocumentReference) - Method in class ir.vsr.Feedback
-
Add a document to the list of those deemed irrelevant
- addEdge(String, String) - Method in class ir.webutils.Graph
-
Adds an edge from xName to yName.
- addEdge(Node) - Method in class ir.webutils.Node
-
Adds an outgoing edge
- addEndSlash(URL) - Static method in class ir.webutils.HTMLPage
-
If URL looks like a directory rather than a file, then
add a "/" at the end so that it acts as a proper base URL
for completing URLs in this page
- addGood(DocumentReference) - Method in class ir.vsr.Feedback
-
Add a document to the list of those deemed relevant
- addLink(MutableAttributeSet, HTML.Attribute) - Method in class ir.webutils.LinkExtractor
-
Retrieves a link from an attribute set and completes it against
the base URL.
- addLink(MutableAttributeSet, HTML.Attribute) - Method in class ir.webutils.YahooCategoryLinkExtractor
-
Retrieves a link from an attribute set and completes it against
the base URL.
- addLink(MutableAttributeSet, HTML.Attribute) - Method in class ir.webutils.YahooSiteLinkExtractor
-
Retrieves a link from an attribute set and completes it against
the base URL.
- addNode(String) - Method in class ir.webutils.Graph
-
Adds a node if it is not already present.
- addResult(int, double) - Method in class ir.classifiers.PointResults
-
Set the nth result
- addScaled(HashMapVector, double) - Method in class ir.vsr.HashMapVector
-
Destructively add a scaled version of the given vector to the current vector
- addVectors(double[], double[]) - Static method in class ir.utilities.MoreMath
-
Add two vectors and return the vector sum
- allLetters(String) - Method in class ir.vsr.Document
-
Check if this token consists of all Unicode letters to eliminate
other bizarre tokens
- ALPHA - Static variable in class ir.vsr.Feedback
-
A Rochio/Ide algorithm parameter
- argMax(double[]) - Method in class ir.classifiers.Classifier
-
Returns the array index with the maximum value
- averageVectors(ArrayList<double[]>) - Static method in class ir.utilities.MoreMath
-
Average all of the vectors in a list of vectors and return
the average vector.
- calculatePriors(List<Example>) - Method in class ir.classifiers.NaiveBayes
-
Calculates the class priors
- calculateProbs(Example) - Method in class ir.classifiers.NaiveBayes
-
Calculates the prob of the testExample being generated by each category
- categories - Variable in class ir.classifiers.Classifier
-
Array of categories (classes) in the data
- categories - Variable in class ir.classifiers.DirectoryExamplesConstructor
-
Array of categories (classes) in the data
- category - Variable in class ir.classifiers.Example
-
Category index of the example
- categoryLinks - Variable in class ir.webutils.YahooSpider
-
List of category links found for the current directory page
- categoryLinksMap - Variable in class ir.webutils.YahooSpider
-
The HashMap for storing categoryLinks for already downloaded Links
- Classifier - Class in ir.classifiers
-
Abstract class specifying the functionality of a classifier.
- Classifier() - Constructor for class ir.classifiers.Classifier
-
- classifier - Variable in class ir.classifiers.CVLearningCurve
-
The classifier for which K-fold CV learning curve has to be generated
- classPriors - Variable in class ir.classifiers.BayesResult
-
Stores the prior probabilities of each class
- cleanURL(URL) - Static method in class ir.webutils.Link
-
Standardize URL by removing trailing slashes, URL decoding it,
replacing the UTCS-specific "/users/user" to "/~user" link, and
removing a set of common index pages.
- clear() - Method in class ir.vsr.HashMapVector
-
Clears the vector back to all zeros
- clear() - Method in class ir.vsr.InvertedIndex
-
Clear all documents from the inverted index
- compareTo(Object) - Method in class ir.vsr.Retrieval
-
Compares this Retrieval to another for sorting from best to worst.
- computeIDFandDocumentLengths() - Method in class ir.vsr.InvertedIndex
-
Compute the IDF factor for every token in the index and the length
of the document vector for every document referenced in the index.
- conditionalProbs(List<Example>) - Method in class ir.classifiers.NaiveBayes
-
Calculates the conditional probs of each feature in the different categories
- contains(String) - Method in class ir.webutils.RobotExclusionSet
-
Checks to see if a path is prohibited by this set.
- copy() - Method in class ir.vsr.HashMapVector
-
Produce a copy of this HashMapVector with a new HashMap and new
Weight's
- corpusDir - Variable in class ir.eval.Experiment
-
The directory from which the indexed documents come.
- cosineTo(HashMapVector) - Method in class ir.vsr.HashMapVector
-
Computes cosine of angle to otherVector.
- cosineTo(HashMapVector, double) - Method in class ir.vsr.HashMapVector
-
Computes cosine of angle to otherVector when also given otherVector's Euclidian length
(Allows saving computation if length already known.
- count - Variable in class ir.utilities.Counter
-
The integer count
- count - Variable in class ir.vsr.TokenOccurrence
-
The number of times it occurs in the Document
- count - Variable in class ir.webutils.Spider
-
The number of pages indexed.
- count - Variable in class ir.webutils.YahooSpider
-
The number of pages indexed.
- Counter - Class in ir.utilities
-
A simple wrapper data structure for storing an integer count
as an Object that can be put into lists, maps, etc.
- Counter() - Constructor for class ir.utilities.Counter
-
- covariance(double[], double[]) - Static method in class ir.utilities.Stats
-
Return the covariance between the vectors x
and y
.
- CVLearningCurve - Class in ir.classifiers
-
Gives learning curves with K-fold cross validation for a classifier.
- CVLearningCurve(int, Classifier, List<Example>, double[], long, boolean) - Constructor for class ir.classifiers.CVLearningCurve
-
Creates a CVLearning curve object
- CVLearningCurve(Classifier, List<Example>) - Constructor for class ir.classifiers.CVLearningCurve
-
Creates a CVLearning curve object with 10 folds and default points
- debug - Variable in class ir.classifiers.CVLearningCurve
-
Flag for debug display
- decrement() - Method in class ir.utilities.Counter
-
Decrement and return the new count
- decrement(int) - Method in class ir.utilities.Counter
-
Decrement by n and return the new count
- decrement() - Method in class ir.utilities.Weight
-
Decrement and return the new count
- decrement(int) - Method in class ir.utilities.Weight
-
Decrement by n and return the new count
- decrement(double) - Method in class ir.utilities.Weight
-
Decrement by n and return the new count
- DEFAULT_POINTS - Static variable in class ir.classifiers.CVLearningCurve
-
Default points
- DirectoryExamplesConstructor - Class in ir.classifiers
-
Creates a list of examples from a directory where file names contain the
category name as a substring.
- DirectoryExamplesConstructor(String, String[], short, boolean) - Constructor for class ir.classifiers.DirectoryExamplesConstructor
-
Construct an ExamplesConstructor for the given directory and category labels
- DirectoryExamplesConstructor(String, String[]) - Constructor for class ir.classifiers.DirectoryExamplesConstructor
-
Construct an ExamplesConstructor for the given directory and category labels
- DirectorySpider - Class in ir.webutils
-
Spider that limits itself to the directory it started in.
- DirectorySpider() - Constructor for class ir.webutils.DirectorySpider
-
- dirFile - Variable in class ir.vsr.InvertedIndex
-
The directory from which the indexed documents come.
- dirName - Variable in class ir.classifiers.DirectoryExamplesConstructor
-
Name of the directory where the example files are stored.
- display(String) - Static method in class ir.utilities.Browser
-
Make browser display a given URL
- display(File) - Static method in class ir.utilities.Browser
-
Make browser display a given file
- displayProbs(double[], Hashtable<String, double[]>) - Method in class ir.classifiers.NaiveBayes
-
Displays the probs for each feature in the different categories
- displayURL(URL) - Method in class ir.webutils.WebPageViewer
-
- doCrawl() - Method in class ir.webutils.Spider
-
Performs the crawl.
- doCrawl() - Method in class ir.webutils.YahooSpider
-
Performs the crawl.
- docRef - Variable in class ir.vsr.Retrieval
-
A reference to the Document being retrieved
- docRef - Variable in class ir.vsr.TokenOccurrence
-
A reference to the Document where it occurs
- docRefs - Variable in class ir.vsr.InvertedIndex
-
A list of all indexed documents.
- docType - Variable in class ir.classifiers.DirectoryExamplesConstructor
-
Type of document (text or HTML)
- docType - Variable in class ir.vsr.DocumentIterator
-
The type of documents to be created
- docType - Variable in class ir.vsr.InvertedIndex
-
The type of Documents (text, HTML).
- document - Variable in class ir.classifiers.Example
-
fileDocument object for the example
- Document - Class in ir.vsr
-
Docment is an abstract class that provides for tokenization
of a document with stop-word removal and an iterator-like interface
similar to StringTokenizer.
- Document(boolean) - Constructor for class ir.vsr.Document
-
Creates a new Document making sure that the stopwords
are loaded, indexed, and ready for use.
- DocumentIterator - Class in ir.vsr
-
An object for iterating over a set of documents in a directory.
- DocumentIterator(File, short, boolean, FilenameFilter) - Constructor for class ir.vsr.DocumentIterator
-
Create an iterator with these attributes
- DocumentIterator(File, short, boolean) - Constructor for class ir.vsr.DocumentIterator
-
Create an iterator with these attributes
- DocumentIterator(File) - Constructor for class ir.vsr.DocumentIterator
-
Create an iterator for TexFileDocuments
- DocumentReference - Class in ir.vsr
-
A simple data structure for storing a reference to a document file
that includes information on the length of its document vector.
- DocumentReference(File, double) - Constructor for class ir.vsr.DocumentReference
-
- DocumentReference(FileDocument) - Constructor for class ir.vsr.DocumentReference
-
Create a reference to this document, initializing its length to 0
- DoubleValue - Class in ir.utilities
-
A simple wrapper data structure for storing a double real value
as an Object whose value can be reset.
- DoubleValue(double) - Constructor for class ir.utilities.DoubleValue
-
- empty() - Method in class ir.webutils.HTMLPage
-
Returns true if the page is empty or a 404 error.
- entrySet() - Method in class ir.vsr.HashMapVector
-
Returns the Set of MapEntries in the hashMap
- equals(Object) - Method in class ir.webutils.Link
-
- Example - Class in ir.classifiers
-
An object to hold training or test examples for categorization.
- Example(HashMapVector, int, String, FileDocument) - Constructor for class ir.classifiers.Example
-
- ExamplesConstructor - Class in ir.classifiers
-
Creates a list of Examples from data files
Specializations handle various ways of storing
examples.
- ExamplesConstructor() - Constructor for class ir.classifiers.ExamplesConstructor
-
- Experiment - Class in ir.eval
-
Contains methods for running evaluation experiments for information
retrieval, specifically the generation of recall-precision curves
for a given test corpus of query/relevant-documents pairs.
- Experiment(File, File, File, short, boolean) - Constructor for class ir.eval.Experiment
-
Create an Experiment object for generating Recall/Precision curves
- Experiment(InvertedIndex, File, File) - Constructor for class ir.eval.Experiment
-
Create an Experiment object for generating Recall/Precision curves
using a provided InvertedIndex
- ExperimentRated - Class in ir.eval
-
Version of Experiment for queries that have continuously rated
gold-standard document relevance judgements and includes evaluation
with NDCG.
- ExperimentRated(File, File, File, short, boolean) - Constructor for class ir.eval.ExperimentRated
-
Constructor that just calls the Experiment constructor
- extractLinks() - Method in class ir.webutils.LinkExtractor
-
Extracts links from the given page.
- extractLinks() - Method in class ir.webutils.YahooCategoryLinkExtractor
-
Extracts cateory links from the given Yahoo page.
- extractLinks() - Method in class ir.webutils.YahooSiteLinkExtractor
-
Extracts site links from the given Yahoo page.
- featureTable - Variable in class ir.classifiers.BayesResult
-
Stores the counts for each feature: an entry in the hashTable stores
the array of class counts for a feature
- Feedback - Class in ir.vsr
-
Gets and stores information about relevance feedback from the user and computes
an updated query based on original query and retrieved documents that are
rated relevant and irrelevant.
- Feedback(HashMapVector, Retrieval[], InvertedIndex) - Constructor for class ir.vsr.Feedback
-
Create a feedback object for this query with initial retrievals to be rated
- feedback - Variable in class ir.vsr.InvertedIndex
-
Whether relevance feedback using the Ide_regular algorithm is used
- file - Variable in class ir.vsr.DocumentReference
-
The file where the referenced document is stored.
- file - Variable in class ir.vsr.FileDocument
-
The name of the file
- FileDocument - Class in ir.vsr
-
A Document stored as a file.
- FileDocument(File, boolean) - Constructor for class ir.vsr.FileDocument
-
Creates a FileDocument and initializes its name and reader.
- fileExtension(String) - Static method in class ir.utilities.MoreString
-
- filePrefix - Variable in class ir.webutils.YahooSpider
-
Prefix to add to the name of all saved files for the current cateogry
- FilePrefixer - Class in ir.utilities
-
Prefix all files in a directory with a particular prefix.
- FilePrefixer(File, FilenameFilter) - Constructor for class ir.utilities.FilePrefixer
-
- FilePrefixer(File) - Constructor for class ir.utilities.FilePrefixer
-
- files - Variable in class ir.vsr.DocumentIterator
-
An array of files in the directory
- fileToString(String) - Static method in class ir.utilities.MoreString
-
Load the stopwords from file to the hashtable where they are indexed.
- findClassID(String) - Method in class ir.classifiers.DirectoryExamplesConstructor
-
Finds the class ID from the name of the document file.
- foldBins - Variable in class ir.classifiers.CVLearningCurve
-
foldBins[i][j] stores the examples for class i in fold j.
- GAMMA - Static variable in class ir.vsr.Feedback
-
A Rochio/Ide algorithm parameter
- getCategories() - Method in class ir.classifiers.Classifier
-
Returns the categories (classes) in the data
- getCategory() - Method in class ir.classifiers.Example
-
Returns the category of the example
- getClassifier() - Method in class ir.classifiers.CVLearningCurve
-
Return classifier
- getClassPriors() - Method in class ir.classifiers.BayesResult
-
Returns the class priors
- getDocument() - Method in class ir.classifiers.Example
-
Returns the document of the example
- getDocument(short, boolean) - Method in class ir.vsr.DocumentReference
-
Get the full Document for this Document reference by recreating it
with the given docType and stemming
- getEdgesIn() - Method in class ir.webutils.Node
-
Gives the list of incoming edges
- getEdgesOut() - Method in class ir.webutils.Node
-
Gives the list of outgoing edges
- getEpsilon() - Method in class ir.classifiers.NaiveBayes
-
Returns value of EPSILON
- getExamples() - Method in class ir.classifiers.DirectoryExamplesConstructor
-
Get the examples from the directory, process them into HashMapVector's and
label them with the correct category label
- getExamples() - Method in class ir.classifiers.ExamplesConstructor
-
Return the list of examples for this dataset
- getExistingNode(String) - Method in class ir.webutils.Graph
-
Returns the node with that name
- getFeatureTable() - Method in class ir.classifiers.BayesResult
-
Returns the feature hash
- getFeedback(int) - Method in class ir.vsr.Feedback
-
Prompt the user for feedback on this numbered retrieval
- getFoldBins() - Method in class ir.classifiers.CVLearningCurve
-
Return the fold Bins
- getHashMapVector() - Method in class ir.classifiers.Example
-
Returns the hashVector of the example
- getHTMLPage(Link) - Method in class ir.webutils.HTMLPageRetriever
-
Downloads a web page from a given URL.
- getHTMLPage(Link) - Method in class ir.webutils.SafeHTMLPageRetriever
-
Tries to download the given web page.
- getIsLaplace() - Method in class ir.classifiers.NaiveBayes
-
Returns value of isLaplace
- getLink() - Method in class ir.webutils.HTMLPage
-
Returns the Link
object that was used to access
this page.
- getName() - Method in class ir.classifiers.Classifier
-
The name of a classifier
- getName() - Method in class ir.classifiers.Example
-
Returns the name of the example
- getName() - Method in class ir.classifiers.NaiveBayes
-
Returns the name
- getNewLinks(HTMLPage) - Method in class ir.webutils.DirectorySpider
-
Gets links from the page that are in or below the starting
directory.
- getNewLinks(HTMLPage) - Method in class ir.webutils.SiteSpider
-
Gets links from the given page that are on the same host as the
page.
- getNewLinks(HTMLPage) - Method in class ir.webutils.Spider
-
Returns a list of links to follow from a given page.
- getNextCandidateToken() - Method in class ir.vsr.Document
-
Return the next possible token in the document.
- getNextCandidateToken() - Method in class ir.vsr.HTMLFileDocument
-
Return the next purely alpha-character token in the document, or null if none left.
- getNextCandidateToken() - Method in class ir.vsr.TextFileDocument
-
Return the next purely alpha-character token in the document, or null if none left.
- getNextCandidateToken() - Method in class ir.vsr.TextStringDocument
-
Get the next token from this string
- getNode(String) - Method in class ir.webutils.Graph
-
Returns the node with that name, creates one if not
already present.
- getOutLinks() - Method in class ir.webutils.HTMLPage
-
Get the list of out links from this page.
- getParser() - Method in class ir.webutils.HTMLParserMaker
-
Returns a parser.
- getPoint() - Method in class ir.classifiers.PointResults
-
- getRandomLink(List<Link>) - Method in class ir.webutils.YahooSpider
-
Pick a random link from a list of links
- getResults() - Method in class ir.classifiers.PointResults
-
- getRetrieval(double, DocumentReference, double) - Method in class ir.vsr.InvertedIndex
-
Calculate the final score for a retrieval and return a Retrieval object representing
the retrieval with its final score.
- getTestCV(int) - Method in class ir.classifiers.CVLearningCurve
-
Creates the testing set for one fold of a cross-validation
on the dataset.
- getText() - Method in class ir.webutils.HTMLPage
-
Returns the full text of this page.
- getTotalExamples() - Method in class ir.classifiers.CVLearningCurve
-
Return all the examples
- getTrainCV(int, double) - Method in class ir.classifiers.CVLearningCurve
-
Creates the training set for one fold of a cross-validation
on the dataset.
- getTrainResult() - Method in class ir.classifiers.NaiveBayes
-
Returns training result
- getURL() - Method in class ir.webutils.Link
-
Returns the URL of this link.
- getURL(String) - Static method in class ir.webutils.URLChecker
-
Returns a URL for the given string after correcting simple errors.
- getValue() - Method in class ir.utilities.Counter
-
Get the current count
- getValue() - Method in class ir.utilities.Weight
-
Get the current count
- getWebPage(String) - Static method in class ir.webutils.WebPage
-
Downloads the web page specified by the URL represented by a
given string.
- getWebPage(URL) - Static method in class ir.webutils.WebPage
-
Downloads the web page specified by the given URL
object.
- getWeight(String) - Method in class ir.vsr.HashMapVector
-
Return the weight of the given token in the vector
- go(String[]) - Method in class ir.webutils.Spider
-
Checks command line arguments and performs the crawl.
- go(String[]) - Method in class ir.webutils.YahooSpider
-
Checks command line arguments and performs the crawl.
- goodDocRefs - Variable in class ir.vsr.Feedback
-
The list of DocumentReference's that were rated relevant
- Graph - Class in ir.webutils
-
Graph data structure.
- Graph() - Constructor for class ir.webutils.Graph
-
Basic constructor.
- handleCCommandLineOption(String) - Method in class ir.webutils.Spider
-
Called when "-c" is passed in on the command line.
- handleCCommandLineOption(String) - Method in class ir.webutils.YahooSpider
-
Called when "-c" is passed in on the command line.
- handleDCommandLineOption(String) - Method in class ir.webutils.Spider
-
Called when "-d" is passed in on the command line.
- handleDCommandLineOption(String) - Method in class ir.webutils.YahooSpider
-
Called when "-d" is passed in on the command line.
- handleEndTag(HTML.Tag, int) - Method in class ir.webutils.LinkExtractor
-
Executed when a closing HTML tag is found in the document.
- handleEndTag(HTML.Tag, int) - Method in class ir.webutils.YahooCategoryLinkExtractor
-
Executed when a closing HTML tag is found in the document.
- handleEndTag(HTML.Tag, int) - Method in class ir.webutils.YahooSiteLinkExtractor
-
Executed when a closing HTML tag is found in the document.
- handlePCommandLineOption(String) - Method in class ir.webutils.YahooSpider
-
Called when "-p" is passed on the command line.
- handleSafeCommandLineOption() - Method in class ir.webutils.Spider
-
Called when "-safe" is passed in on the command line.
- handleSimpleTag(HTML.Tag, MutableAttributeSet, int) - Method in class ir.webutils.LinkExtractor
-
Executed when an HTML tag that has no closing tag is found in
the document.
- handleSimpleTag(HTML.Tag, MutableAttributeSet, int) - Method in class ir.webutils.RobotsMetaTagParser
-
Checks for robots META tags.
- handleSimpleTag(HTML.Tag, MutableAttributeSet, int) - Method in class ir.webutils.YahooCategoryLinkExtractor
-
Executed when an HTML tag that has no closing tag is found in
the document.
- handleSimpleTag(HTML.Tag, MutableAttributeSet, int) - Method in class ir.webutils.YahooSiteLinkExtractor
-
Executed when an HTML tag that has no closing tag is found in
the document.
- handleSlowCommandLineOption() - Method in class ir.webutils.Spider
-
Called when "-slow" is passed in on the command line.
- handleSlowCommandLineOption() - Method in class ir.webutils.YahooSpider
-
Called when "-slow" is passed in on the command line.
- handleStartTag(HTML.Tag, MutableAttributeSet, int) - Method in class ir.webutils.LinkExtractor
-
Executed when an opening HTML tag is found in the document.
- handleStartTag(HTML.Tag, MutableAttributeSet, int) - Method in class ir.webutils.YahooCategoryLinkExtractor
-
Executed when an opening HTML tag is found in the document.
- handleStartTag(HTML.Tag, MutableAttributeSet, int) - Method in class ir.webutils.YahooSiteLinkExtractor
-
Executed when an opening HTML tag is found in the document.
- handleText(char[], int) - Method in class ir.webutils.LinkExtractor
-
Executed when a block of text is encountered.
- handleText(char[], int) - Method in class ir.webutils.YahooCategoryLinkExtractor
-
Executed when a block of text is encountered.
- handleText(char[], int) - Method in class ir.webutils.YahooSiteLinkExtractor
-
Executed when a block of text is encountered.
- handleUCommandLineOption(String) - Method in class ir.webutils.DirectorySpider
-
Sets the initial URL from the "-u" argument, then calls the
corresponding superclass method.
- handleUCommandLineOption(String) - Method in class ir.webutils.Spider
-
Called when "-u" is passed in on the command line.
- handleUCommandLineOption(String) - Method in class ir.webutils.YahooSpider
-
Called when "-u" is passed in on the command line.
- hashCode() - Method in class ir.webutils.Link
-
- hashMap - Variable in class ir.vsr.HashMapVector
-
The HashMap that stores the mapping of tokens to Weights
- hashMapVector() - Method in class ir.vsr.Document
-
Returns a hashmap version of the term-vector (bag of words) for this
document, where each token is a key whose value is the number of times
it occurs in the document as stored in a Weight.
- HashMapVector - Class in ir.vsr
-
A data structure for a term vector for a document stored
as a HashMap that maps tokens to Weight's that store the
weight of that token in the document.
- HashMapVector() - Constructor for class ir.vsr.HashMapVector
-
- hashVector - Variable in class ir.classifiers.Example
-
Representation of the example as a vector of (feature -> weight) mappings
- hasMoreDocuments() - Method in class ir.vsr.DocumentIterator
-
Returns true iff there are more documents in this directory
- hasMoreTokens() - Method in class ir.vsr.Document
-
Returns true iff the document contains more tokens
- haveFeedback(int) - Method in class ir.vsr.Feedback
-
Has the user already provided feedback on this numbered retrieval?
- HTMLFileDocument - Class in ir.vsr
-
An HTML file document where HTML commands are removed
from the token stream.
- HTMLFileDocument(File, boolean) - Constructor for class ir.vsr.HTMLFileDocument
-
Create a new text document for the given file.
- HTMLFileDocument(String, boolean) - Constructor for class ir.vsr.HTMLFileDocument
-
Create a new text document for the given file name.
- HTMLPage - Class in ir.webutils
-
HTMLPage is a representation of information about a web
page.
- HTMLPage(Link, String) - Constructor for class ir.webutils.HTMLPage
-
Constructs an HTMLPage
with the given link and text.
- HTMLPageRetriever - Class in ir.webutils
-
HTMLPageRetriever allows clients to download web pages from URLs.
- HTMLPageRetriever() - Constructor for class ir.webutils.HTMLPageRetriever
-
Constructs a HTMLPageRetriever object.
- HTMLParserMaker - Class in ir.webutils
-
HTMLParserMaker allows clients to retrieve an
HTMLEditorKit.Parser instance.
- HTMLParserMaker() - Constructor for class ir.webutils.HTMLParserMaker
-
- idf - Variable in class ir.vsr.TokenInfo
-
The IDF (inverse document frequency) factor for this token
which indicates how much to weight an occurence.
- inCategorySection - Variable in class ir.webutils.YahooCategoryLinkExtractor
-
Flag that is true during parsing while HTML parser is
in the section of the webpage that lists subcateogry links
- incorporateToken(String, double, Map<DocumentReference, DoubleValue>) - Method in class ir.vsr.InvertedIndex
-
Retrieve the documents indexed by this token in the inverted index,
add it to the retrievalHash if needed, and update its running total score.
- increment() - Method in class ir.utilities.Counter
-
Increment and return the new count
- increment(int) - Method in class ir.utilities.Counter
-
Increment by n and return the new count
- increment() - Method in class ir.utilities.Weight
-
Increment and return the new count
- increment(int) - Method in class ir.utilities.Weight
-
Increment by n and return the new count
- increment(double) - Method in class ir.utilities.Weight
-
Increment by n and return the new count
- increment(String, double) - Method in class ir.vsr.HashMapVector
-
Increment the weight for the given token in the vector by the given amount.
- increment(String) - Method in class ir.vsr.HashMapVector
-
Increment the weight for the given token in the vector by 1.
- increment(String, int) - Method in class ir.vsr.HashMapVector
-
Increment the weight for the given token in the vector by the given int
- index() - Method in class ir.webutils.RobotsMetaTagParser
-
Indicates whether the page can be indexed.
- indexAllowed() - Method in class ir.webutils.HTMLPage
-
Clients should always call this method before indexing an HTML
page if they want to obey the "NOINDEX" directive in the Robots
META tag.
- indexAllowed() - Method in class ir.webutils.SafeHTMLPage
-
Indicates whether or not indexing has been disallowed by a
Robots META tag.
- indexDocument(FileDocument, HashMapVector) - Method in class ir.vsr.InvertedIndex
-
Index the given document using its corresponding vector
- indexDocuments() - Method in class ir.vsr.InvertedIndex
-
Index the documents in dirFile.
- indexDocuments(List<Example>) - Method in class ir.vsr.InvertedIndex
-
Index the documents in the List of Examples for text categorization.
- indexOfIgnoreCase(String, String, int) - Static method in class ir.utilities.MoreString
-
- indexOfIgnoreCase(String, String) - Static method in class ir.utilities.MoreString
-
- indexPage(HTMLPage) - Method in class ir.webutils.Spider
-
"Indexes" a HTMLpage
.
- indexPage(HTMLPage) - Method in class ir.webutils.YahooSpider
-
"Indexes" a HTMLpage
.
- indexToken(String, int, DocumentReference) - Method in class ir.vsr.InvertedIndex
-
Add a token occurrence to the index.
- inSiteSection - Variable in class ir.webutils.YahooSiteLinkExtractor
-
Flag that is true during parsing while HTML parser is
in the section of the webpage that lists site links
- invertedIndex - Variable in class ir.vsr.Feedback
-
The current InvertedIndex
- InvertedIndex - Class in ir.vsr
-
An inverted index for vector-space information retrieval.
- InvertedIndex(File, short, boolean, boolean) - Constructor for class ir.vsr.InvertedIndex
-
Create an inverted index of the documents in a directory.
- InvertedIndex(List<Example>) - Constructor for class ir.vsr.InvertedIndex
-
Create an inverted index of the documents in a List of Example objects of documents
for text categorization.
- ir.classifiers - package ir.classifiers
-
Provides methods for classifying text documents using machine learning.
- ir.eval - package ir.eval
-
Provides methods for running experiments for evaluating information retrieval.
- ir.utilities - package ir.utilities
-
Provides utility methods for manipulating various types of data for the overall
IR package
- ir.vsr - package ir.vsr
-
Provides basic vector-space information retrieval system.
- ir.webutils - package ir.webutils
-
Provides web utilities for downloading web pages and spidering the web.
- isEmpty() - Method in class ir.vsr.Feedback
-
Has the user rated any documents yet?
- iterator() - Method in class ir.webutils.RobotExclusionSet
-
- length - Variable in class ir.vsr.DocumentReference
-
The length of the corresponding Document vector.
- length() - Method in class ir.vsr.HashMapVector
-
Compute Euclidian length (sqrt of sum of squares) of vector
- link - Variable in class ir.webutils.HTMLPage
-
The original link to this page
- Link - Class in ir.webutils
-
Link is a class that contains a URL.
- Link() - Constructor for class ir.webutils.Link
-
May be subclassed.
- Link(URL) - Constructor for class ir.webutils.Link
-
Constructs a link with specified URL.
- Link(String) - Constructor for class ir.webutils.Link
-
Construct a link with specified URL string
- LinkExtractor - Class in ir.webutils
-
LinkExtractor defines a callback that extracts the links from an
HTML document and provides functionality to parse a document.
- LinkExtractor(HTMLPage) - Constructor for class ir.webutils.LinkExtractor
-
Create an link extractor for the given page
- links - Variable in class ir.webutils.LinkExtractor
-
The current list of extracted links
- links - Variable in class ir.webutils.YahooCategoryLinkExtractor
-
The current list of extracted category links
- links - Variable in class ir.webutils.YahooSiteLinkExtractor
-
The current list of extracted site links
- linksToVisit - Variable in class ir.webutils.Spider
-
The queue of links maintained by the spider
- linkToHTMLPage(Link) - Method in class ir.webutils.Spider
-
Check if this is a link to an HTML page.
- linkToHTMLPage(Link) - Method in class ir.webutils.YahooSpider
-
Check if this is a link to an HTML page.
- loadStopWords() - Static method in class ir.vsr.Document
-
Load the stopwords from file to the hashtable where they are indexed.
- log(double, double) - Static method in class ir.utilities.MoreMath
-
Return logarithm of a given base
- log(int, int) - Static method in class ir.utilities.MoreMath
-
- log(double, int) - Static method in class ir.utilities.MoreMath
-
- log(int, double) - Static method in class ir.utilities.MoreMath
-
- main(String[]) - Static method in class ir.classifiers.DirectoryExamplesConstructor
-
Test loading a sample directory of examples
- main(String[]) - Static method in class ir.classifiers.TestNaiveBayes
-
A driver method for testing the NaiveBayes classifier using
10-fold cross validation.
- main(String[]) - Static method in class ir.classifiers.TestNaiveBayes2
-
A driver method for testing the NaiveBayes classifier using
10-fold cross validation.
- main(String[]) - Static method in class ir.eval.Experiment
-
Evaluate retrieval performance on a given query test corpus and
generate a recall/precision graph.
- main(String[]) - Static method in class ir.eval.ExperimentRated
-
Evaluate retrieval performance on a given query test corpus and
generate a recall/precision graph and table of NDCG results.
- main(String[]) - Static method in class ir.utilities.Browser
-
Test interface
- main(String[]) - Static method in class ir.utilities.FilePrefixer
-
- main(String[]) - Static method in class ir.utilities.MoreString
-
- main(String[]) - Static method in class ir.utilities.Porter
-
For testing, print the stemmed version of a word
- main(String[]) - Static method in class ir.vsr.DocumentIterator
-
Test by printing the bag-of-words for each file in the given directory
- main(String[]) - Static method in class ir.vsr.HTMLFileDocument
-
For testing, print the bag-of-words vector for a given HTML file
- main(String[]) - Static method in class ir.vsr.InvertedIndex
-
Index a directory of files and then interactively accept retrieval queries.
- main(String[]) - Static method in class ir.vsr.TextFileDocument
-
For testing, print the bag-of-words vector for a given file
- main(String[]) - Static method in class ir.vsr.TextStringDocument
-
For testing, print the bag-of-words vector for the given string
- main(String[]) - Static method in class ir.webutils.DirectorySpider
-
Spider the web according to the following command options,
but only below the start URL directory.
- main(String[]) - Static method in class ir.webutils.Graph
-
- main(String[]) - Static method in class ir.webutils.Link
-
- main(String[]) - Static method in class ir.webutils.RobotExclusionSet
-
For testing only.
- main(String[]) - Static method in class ir.webutils.SiteSpider
-
Spider the web according to the following command options,
but stay within the given site (same URL host).
- main(String[]) - Static method in class ir.webutils.Spider
-
Spider the web according to the following command options:
-safe : Check for and obey robots.txt and robots META tag
directives.
-d <directory> : Store indexed files in <directory>.
-c <maxCount> : Store at most <maxCount> files (default is 10,000).
-u <url> : Start at <url>.
-slow : Pause briefly before getting a page.
- main(String[]) - Static method in class ir.webutils.WebPage
-
Retrieve the page on the URL given and output its contents to STDOUT.
- main(String[]) - Static method in class ir.webutils.WebPageViewer
-
- main(String[]) - Static method in class ir.webutils.YahooCategoryLinkExtractor
-
Given Yahoo directory URL as a single arg, test extraction of
category links from this page.
- main(String[]) - Static method in class ir.webutils.YahooSiteLinkExtractor
-
Given Yahoo directory URL as a single arg, test extraction of
site links from this page.
- main(String[]) - Static method in class ir.webutils.YahooSpider
-
Spider Yahoo category to randomly collect pages according to the following command options:
-d <directory> : Store indexed files in <directory>.
-c <maxCount> : Find <maxCount> files (default is 10,000).
-u <url> : Start at Yahoo directory page given by <url>.
-p <prefix > : Prefix saved file names with <prefix>.
-slow : Pause briefly before getting a page.
- makeRpCurve() - Method in class ir.eval.Experiment
-
Process and evaluate all queries and generate recall-precision curve
- MAX_RETRIEVALS - Static variable in class ir.vsr.InvertedIndex
-
The maximum number of retrieved documents for a query to present to the user
at a time
- maxCount - Variable in class ir.webutils.Spider
-
The maximum number of pages to be indexed.
- maxCount - Variable in class ir.webutils.YahooSpider
-
The number of pages to be found and indexed.
- maxWeight() - Method in class ir.vsr.HashMapVector
-
Returns the maximum weight of any token in the vector.
- mean(double[]) - Static method in class ir.utilities.Stats
-
Return the arithmetic mean of the argument values
.
- MoreMath - Class in ir.utilities
-
A place to put some additional math functions
- MoreMath() - Constructor for class ir.utilities.MoreMath
-
- MoreString - Class in ir.utilities
-
A place to put some additional string functions
- MoreString() - Constructor for class ir.utilities.MoreString
-
- moreURL - Variable in class ir.webutils.YahooSiteLinkExtractor
-
Flag that is true during parser while the HTML parser
in inside an anchor link text for a Yahoo link that
refers to more sites not listed on the current page
Stores the URL for this link while in its anchor text.
- multiply(double) - Method in class ir.vsr.HashMapVector
-
Destructively multiply the vector by a constant
- NaiveBayes - Class in ir.classifiers
-
Implements the NaiveBayes Classifier with Laplace smoothing.
- NaiveBayes(String[], boolean) - Constructor for class ir.classifiers.NaiveBayes
-
Create a naive Bayes classifier with these attributes
- name - Variable in class ir.classifiers.Example
-
Name of the example
- name - Static variable in class ir.classifiers.NaiveBayes
-
Name of classifier
- NDCGlimit - Static variable in class ir.eval.ExperimentRated
-
The maximum N for computing NDCG @ N
- NDCGvalues - Variable in class ir.eval.ExperimentRated
-
Current sum of NDCG values @ all levels up to NDCGlimit
Updated when processing each query.
- newQuery() - Method in class ir.vsr.Feedback
-
Use the Ide_regular algorithm to compute a new revised query.
- nextDocument() - Method in class ir.vsr.DocumentIterator
-
Get the next document
- nextNode() - Method in class ir.webutils.Graph
-
Returns the next node in an iterator over the nodes of the graph
- nextToken - Variable in class ir.vsr.Document
-
The next token in the document
- nextToken() - Method in class ir.vsr.Document
-
Returns the next token in the document or null if there are none
- Node - Class in ir.webutils
-
Node in the the Graph data structure.
- Node(String) - Constructor for class ir.webutils.Node
-
Constructs a node with that name.
- nodeArray() - Method in class ir.webutils.Graph
-
Returns all the nodes of the graph.
- numberFound - Variable in class ir.webutils.StringSearchResult
-
Number of different strings found
- numberOccurrences - Variable in class ir.webutils.StringSearchResult
-
Total number of occurrences of any of the strings
- numberOfTokens() - Method in class ir.vsr.Document
-
Returns the total number of tokens in the document or -1 if
there are still more tokens to be read and the total count is not yet available.
- numClasses - Variable in class ir.classifiers.CVLearningCurve
-
Number of classes in the data
- numFolds - Variable in class ir.classifiers.CVLearningCurve
-
Number of folds of cross validation to run
- numStopWords - Static variable in class ir.vsr.Document
-
The number of stopwords in this file
- numTokens - Variable in class ir.vsr.Document
-
The number of tokens currently read from document
- padTo(String, int, char) - Static method in class ir.utilities.MoreString
-
Pad a string with a specific char on the right to make it the specified length
- padTo(String, int) - Static method in class ir.utilities.MoreString
-
Pad a string with blanks on the right to make it the specified length
- padToLeft(String, int, char) - Static method in class ir.utilities.MoreString
-
Pad a string with a specific char on the left to make it the specified length
- padToLeft(String, int) - Static method in class ir.utilities.MoreString
-
Pad a string with blanks on the left to make it the specified length
- padToLeft(double, int) - Static method in class ir.utilities.MoreString
-
Convert a double to a string and pad with blanks on the left
to make it the specified length
- padToLeft(int, int) - Static method in class ir.utilities.MoreString
-
Convert an int to a string and pad with blanks on the left
to make it the specified length
- padWithZeros(int, int) - Static method in class ir.utilities.MoreString
-
- padWithZeros(double, int) - Static method in class ir.utilities.MoreString
-
- page - Variable in class ir.webutils.LinkExtractor
-
The page from which to extract links
- page - Variable in class ir.webutils.YahooCategoryLinkExtractor
-
The page from which to extract links
- page - Variable in class ir.webutils.YahooSiteLinkExtractor
-
The page from which to extract links
- parseMetaTags() - Method in class ir.webutils.RobotsMetaTagParser
-
Parses the document and returns a list of links that can not be
followed.
- PathDisallowedException - Exception in ir.webutils
-
PathDisallowedException is thrown to indicate that a client program tried
to access a path that was disallowed by either a robots.txt file or a robots META tag.
- PathDisallowedException() - Constructor for exception ir.webutils.PathDisallowedException
-
- PathDisallowedException(String) - Constructor for exception ir.webutils.PathDisallowedException
-
- pearsonCorrelation(double[], double[]) - Static method in class ir.utilities.Stats
-
Return the Pearson Correlation between the vectors x
and y
.
- point - Variable in class ir.classifiers.PointResults
-
Point on curve at which results are for
- PointResults - Class in ir.classifiers
-
Utility class for generating average result curves.
- PointResults(int) - Constructor for class ir.classifiers.PointResults
-
Create a vector of results for a point
- points - Variable in class ir.classifiers.CVLearningCurve
-
Points on the X axis (percentage of train data) to plot
- Porter - Class in ir.utilities
-
The Porter stemmer for reducing words to their base stem form.
- Porter() - Constructor for class ir.utilities.Porter
-
- position - Variable in class ir.vsr.DocumentIterator
-
The current position of the iterator in this array
- precision - Variable in class ir.eval.RecallPrecisionPair
-
- prefix(String) - Method in class ir.utilities.FilePrefixer
-
- prepareNextToken() - Method in class ir.vsr.Document
-
The nextToken slot is always precomputed and stored by this method.
- presentRetrievals(HashMapVector, Retrieval[]) - Method in class ir.vsr.InvertedIndex
-
Print out a ranked set of retrievals.
- print() - Method in class ir.vsr.HashMapVector
-
Print out the vector showing the tokens and their weights
- print() - Method in class ir.vsr.InvertedIndex
-
Print out an inverted index by listing each token and the documents it occurs in.
- print() - Method in class ir.webutils.Graph
-
Prints the entire graph on stdout.
- printRetrievals(Retrieval[], int) - Method in class ir.vsr.InvertedIndex
-
Print out at most MAX_RETRIEVALS ranked retrievals starting at given starting rank number.
- printVector(double[]) - Static method in class ir.utilities.MoreMath
-
Print a vector in the form [x,y,...z] to standard out
- printVector(double[], PrintStream) - Static method in class ir.utilities.MoreMath
-
Print a vector in the form [x,y,...z] to the print stream
- printVector() - Method in class ir.vsr.Document
-
Compute and print out (one line per term) the term-vector (bag of words)
for this document
- processArgs(String[]) - Method in class ir.webutils.Spider
-
Processes command-line arguments.
- processArgs(String[]) - Method in class ir.webutils.YahooSpider
-
Processes command-line arguments.
- processQueries() - Method in class ir.vsr.InvertedIndex
-
Enter an interactive user-query loop, accepting queries and showing the retrieved
documents in ranked order.
- prompt(String) - Static method in class ir.utilities.UserInput
-
Prompt the user with a string and then get a line of input
- random - Static variable in class ir.classifiers.Classifier
-
Used for breaking ties in argMax()
- random - Variable in class ir.webutils.YahooSpider
-
Random number generator to use
- randomSeed - Variable in class ir.classifiers.CVLearningCurve
-
Seed for random number generator
- ratingsMap - Variable in class ir.eval.ExperimentRated
-
HashMap that stores the mapping of document names to their gold-standard relevance ratings
- reader - Variable in class ir.vsr.FileDocument
-
The I/O reader for accessing the file
- readFromFile(String) - Method in class ir.webutils.Graph
-
Reads graph from file where each line consists of a node-name followed by a
list of the names of nodes to which it points
- readLine() - Static method in class ir.utilities.UserInput
-
Read a line of input from the user
- recall - Variable in class ir.eval.RecallPrecisionPair
-
- RECALL_LEVELS - Static variable in class ir.eval.Experiment
-
The standard recall levels for which we want to plot precision values
- RecallPrecisionPair - Class in ir.eval
-
A lightweight object for storing a pair of recall precision measures
- RecallPrecisionPair(double, double) - Constructor for class ir.eval.RecallPrecisionPair
-
- removeEndSlash(URL) - Static method in class ir.webutils.Link
-
Removes slash at end of URL to normalize
- removeRef(URL) - Static method in class ir.webutils.Link
-
Remove the internal "ref" pointer in a URL if there is
one.
- resetIterator() - Method in class ir.webutils.Graph
-
Resets the iterator.
- results - Variable in class ir.classifiers.PointResults
-
Sampled values of result at this point
- Retrieval - Class in ir.vsr
-
A lightweight object for storing information about a retrieved Document.
- Retrieval(DocumentReference, double) - Constructor for class ir.vsr.Retrieval
-
Create a retrieval with these values
- retrievals - Variable in class ir.vsr.Feedback
-
The current list of ranked retrievals
- retrieve(String) - Method in class ir.vsr.InvertedIndex
-
Perform ranked retrieval on this input query.
- retrieve(Document) - Method in class ir.vsr.InvertedIndex
-
Perform ranked retrieval on this input query Document.
- retrieve(HashMapVector) - Method in class ir.vsr.InvertedIndex
-
Perform ranked retrieval on this input query Document vector.
- retriever - Variable in class ir.webutils.Spider
-
The object to be used to retrieve pages
- retriever - Variable in class ir.webutils.YahooSpider
-
The object to be used to retrieve pages
- RobotExclusionSet - Class in ir.webutils
-
RobotExclusionSet provides support for the Robots Exclusion
Protocol.
- RobotExclusionSet() - Constructor for class ir.webutils.RobotExclusionSet
-
Constructs an empty set.
- RobotExclusionSet(String) - Constructor for class ir.webutils.RobotExclusionSet
-
Constructs a set containing the paths in the robots.txt file
for this site.
- RobotsMetaTagParser - Class in ir.webutils
-
Parser callback that extracts robots META tag information.
- RobotsMetaTagParser() - Constructor for class ir.webutils.RobotsMetaTagParser
-
- RobotsMetaTagParser(URL) - Constructor for class ir.webutils.RobotsMetaTagParser
-
- RobotsMetaTagParser(URL, String) - Constructor for class ir.webutils.RobotsMetaTagParser
-
- roundTo(double, int) - Static method in class ir.utilities.MoreMath
-
Round a double to the given number of decimalPlaces
- run() - Method in class ir.classifiers.CVLearningCurve
-
Run a CV learning curve test and print total training and test time
and generate an averge learning curve plot output files suitable
for gunuplot
- SafeHTMLPage - Class in ir.webutils
-
SafeHTMLPage is an immutable representation of information about a
web page that includes information about whether or not this page
can be indexed.
- SafeHTMLPage(Link, String, boolean) - Constructor for class ir.webutils.SafeHTMLPage
-
Constructs an SafeHTMLPage
with the given link,
text, and indication whether or not indexing is allowed.
- SafeHTMLPageRetriever - Class in ir.webutils
-
Keeps track of Robot Exclusion information.
- SafeHTMLPageRetriever() - Constructor for class ir.webutils.SafeHTMLPageRetriever
-
- saveDir - Variable in class ir.webutils.Spider
-
The directory to save the downloaded files to.
- saveDir - Variable in class ir.webutils.YahooSpider
-
The directory to save the downloaded files to.
- score - Variable in class ir.vsr.Retrieval
-
The score given to this document by a retrieval engine.
- segment(String, char) - Static method in class ir.utilities.MoreString
-
Segment a string into substrings by breaking at occurences of the given
character and returning a list of segments
- setCategory(int) - Method in class ir.classifiers.Example
-
Sets the category of the example
- setClassifier(Classifier) - Method in class ir.classifiers.CVLearningCurve
-
Set the classifier
- setClassPriors(double[]) - Method in class ir.classifiers.BayesResult
-
Sets the class priors
- setDebug(boolean) - Method in class ir.classifiers.NaiveBayes
-
Sets the debug flag
- setDocument(FileDocument) - Method in class ir.classifiers.Example
-
Sets the document of the example
- setEpsilon(double) - Method in class ir.classifiers.NaiveBayes
-
Sets the value of EPSILON (default 1e-6)
- setFeatureTable(Hashtable<String, double[]>) - Method in class ir.classifiers.BayesResult
-
Sets the feature hash
- setFoldBins(Vector<Example>[][]) - Method in class ir.classifiers.CVLearningCurve
-
Set the fold Bins
- setHashMapVector(HashMapVector) - Method in class ir.classifiers.Example
-
Sets the hashVector of the example
- setLaplace(boolean) - Method in class ir.classifiers.NaiveBayes
-
Sets the Laplace smoothing flag
- setName(String) - Method in class ir.classifiers.Example
-
Sets the name of the example
- setOutLinks(List<Link>) - Method in class ir.webutils.HTMLPage
-
Set of the outLinks for this page to given list
- setPage(String) - Method in class ir.webutils.RobotsMetaTagParser
-
- setPoint(double) - Method in class ir.classifiers.PointResults
-
- setTotalExamples(Vector<Example>[]) - Method in class ir.classifiers.CVLearningCurve
-
Set all the examples
- setTotalExamples(List<Example>) - Method in class ir.classifiers.CVLearningCurve
-
Sets the totalExamples by partitioning examples into categories to
get a stratified sample
- setUrl(URL) - Method in class ir.webutils.RobotsMetaTagParser
-
- setValue(int) - Method in class ir.utilities.Counter
-
Set the current count
- setValue(int) - Method in class ir.utilities.Weight
-
Set the current count
- setValue(double) - Method in class ir.utilities.Weight
-
Set the current count
- showRetrievals(Retrieval[]) - Method in class ir.vsr.InvertedIndex
-
Show the top retrievals to the user if there are any.
- siteLinks - Variable in class ir.webutils.YahooSpider
-
List of site links found for the current directory page
- siteLinksMap - Variable in class ir.webutils.YahooSpider
-
The HashMap for storing siteLinks for already downloaded Links
- SiteSpider - Class in ir.webutils
-
A spider that limits itself to a given site.
- SiteSpider() - Constructor for class ir.webutils.SiteSpider
-
- size() - Method in class ir.vsr.HashMapVector
-
Returns the number of tokens in the vector.
- size() - Method in class ir.vsr.InvertedIndex
-
Return the number of tokens indexed.
- size() - Method in class ir.webutils.RobotExclusionSet
-
- sizeOfFold(int) - Method in class ir.classifiers.CVLearningCurve
-
Computes the total number of examples in given fold
- slow - Variable in class ir.webutils.Spider
-
Flag to purposely slow the crawl for debugging purposes
- slow - Variable in class ir.webutils.YahooSpider
-
Flag to purposely slow the crawl for debugging purposes
- Spider - Class in ir.webutils
-
Spider defines a framework for writing a web crawler.
- Spider() - Constructor for class ir.webutils.Spider
-
- standardDeviation(double[]) - Static method in class ir.utilities.Stats
-
Return the standard deviation of the argument values
.
- startsWithIgnoreCase(String, String, int) - Static method in class ir.utilities.MoreString
-
- startsWithIgnoreCase(String, String) - Static method in class ir.utilities.MoreString
-
- Stats - Class in ir.utilities
-
A place to put statistical routines
- Stats() - Constructor for class ir.utilities.Stats
-
- stem - Variable in class ir.classifiers.DirectoryExamplesConstructor
-
Flag set to stem words to their root forms
- stem - Variable in class ir.vsr.Document
-
Whether to stem tokens with the Porter stemmer
- stem - Variable in class ir.vsr.DocumentIterator
-
Whether tokens should be stemmed with Porter stemmer
- stem - Variable in class ir.vsr.InvertedIndex
-
Whether tokens should be stemmed with Porter stemmer
- stemmer - Static variable in class ir.vsr.Document
-
The Porter stemmer
- stopWords - Static variable in class ir.vsr.Document
-
The hashtable where stopwords are indexed
- stopWordsFile - Static variable in class ir.vsr.Document
-
The file where a list of stopwords, 1 per line, are stored
- StringSearchResult - Class in ir.webutils
-
Lightweight object for storing both the number of DIFFERENT strings
in a set of search strings that are found in a text as well as the total number
of occurrences in the text of ANY of the strings in the set.
- StringSearchResult(int, int) - Constructor for class ir.webutils.StringSearchResult
-
Construct result with a given numberFound and numberOccurrences
- stripAffixes(String) - Method in class ir.utilities.Porter
-
Takes a String as input and returns its stem as a String.
- subtract(HashMapVector) - Method in class ir.vsr.HashMapVector
-
Destructively subtract the given vector from the current vector
- test(Example) - Method in class ir.classifiers.Classifier
-
Returns true if the predicted category of the test example matches the correct category,
false otherwise
- test(Example) - Method in class ir.classifiers.NaiveBayes
-
Categorizes the test example using the trained Naive Bayes classifier, returning true if
the predicted category is same as the actual category
- TestNaiveBayes - Class in ir.classifiers
-
Wrapper class to test NaiveBayes classifier using 10-fold CV.
- TestNaiveBayes() - Constructor for class ir.classifiers.TestNaiveBayes
-
- TestNaiveBayes2 - Class in ir.classifiers
-
Wrapper class to test NaiveBayes classifier using 10-fold CV.
- TestNaiveBayes2() - Constructor for class ir.classifiers.TestNaiveBayes2
-
- testResults - Variable in class ir.classifiers.CVLearningCurve
-
Accuracy results for test data, one PointResults for each point on the curve
- testTime - Variable in class ir.classifiers.CVLearningCurve
-
Total Testing time
- testTimeNum - Variable in class ir.classifiers.CVLearningCurve
-
Total number of examples tested in test time
- text - Variable in class ir.webutils.HTMLPage
-
The text of the page
- TextFileDocument - Class in ir.vsr
-
A normal ASCII text file Document
- TextFileDocument(File, boolean) - Constructor for class ir.vsr.TextFileDocument
-
Create a new text document for the given file.
- TextFileDocument(String, boolean) - Constructor for class ir.vsr.TextFileDocument
-
Create a new text document for the given file name.
- textReader - Variable in class ir.vsr.HTMLFileDocument
-
The I/O reader for accessing the output of the HTML parser.
- TextStringDocument - Class in ir.vsr
-
A simple document represented by a String
- TextStringDocument(String, boolean) - Constructor for class ir.vsr.TextStringDocument
-
Create a simple Document for this string
- tokenHash - Variable in class ir.vsr.InvertedIndex
-
A HashMap where tokens are indexed.
- TokenInfo - Class in ir.vsr
-
A lightweight object for storing information about a token (a.k.a word, term)
in an inverted index.
- TokenInfo() - Constructor for class ir.vsr.TokenInfo
-
Create an initially empty data structure
- tokenizer - Variable in class ir.vsr.HTMLFileDocument
-
The tokenizer for lines read from this document.
- tokenizer - Variable in class ir.vsr.TextFileDocument
-
The tokenizer for lines read from this document.
- tokenizer - Variable in class ir.vsr.TextStringDocument
-
The tokenizer for this document.
- tokenizerDelim - Static variable in class ir.vsr.HTMLFileDocument
-
StringTokenizer delim for tokenizing only alphabetic strings.
- tokenizerDelim - Static variable in class ir.vsr.TextFileDocument
-
StringTokenizer delim for tokenizing only alphabetic strings.
- tokenizerDelim - Static variable in class ir.vsr.TextStringDocument
-
StringTokenizer delim for tokenizing only alphabetic strings.
- TokenOccurrence - Class in ir.vsr
-
A lightweight object for storing information about an occurrence of a token (a.k.a word, term)
in a Document.
- TokenOccurrence(DocumentReference, int) - Constructor for class ir.vsr.TokenOccurrence
-
Create an occurrence with these values
- topCategoryLink - Variable in class ir.webutils.YahooSpider
-
Link for the main topic Yahoo category
- toString() - Method in class ir.classifiers.Example
-
Returns the String representation of the example object
- toString() - Method in class ir.classifiers.PointResults
-
- toString() - Method in class ir.eval.RecallPrecisionPair
-
- toString() - Method in class ir.vsr.DocumentReference
-
- toString() - Method in class ir.vsr.HashMapVector
-
Return String of the vector showing the tokens and their weights
- toString() - Method in class ir.vsr.TokenOccurrence
-
- toString() - Method in class ir.webutils.Link
-
- toString() - Method in class ir.webutils.Node
-
Returns the name of the node
- totalExamples - Variable in class ir.classifiers.CVLearningCurve
-
Stores all the examples for each class
- totalNumTrain - Variable in class ir.classifiers.CVLearningCurve
-
Total number of training examples per fold
- train(List<Example>) - Method in class ir.classifiers.Classifier
-
Trains the classifier on the training examples
- train(List<Example>) - Method in class ir.classifiers.NaiveBayes
-
Trains the Naive Bayes classifier - estimates the prior probs and calculates the
counts for each feature in different categories
- trainAndTest() - Method in class ir.classifiers.CVLearningCurve
-
Run training and test for each point to be plotted, gathering a result for
each fold.
- trainAndTestFold(Vector<Example>, Vector<Example>, int, PointResults, PointResults) - Method in class ir.classifiers.CVLearningCurve
-
Train and test on given example sets for the given fold:
- trainResults - Variable in class ir.classifiers.CVLearningCurve
-
Accuracy results for training data, one PointResults for each point on the curve
- trainTime - Variable in class ir.classifiers.CVLearningCurve
-
Total Training time
- TYPE_HTML - Static variable in class ir.vsr.DocumentIterator
-
docType for HTML files
- TYPE_TEXT - Static variable in class ir.vsr.DocumentIterator
-
docType for ASCII text files