TextFileDocument

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- ir.vsr.Document
- - ir.vsr.FileDocument
  - - ir.vsr.TextFileDocument

public class TextFileDocument
extends FileDocument

A normal ASCII text file Document

Field Summary

Fields
Modifier and Type	Field and Description
`protected java.util.StringTokenizer`	`tokenizer` The tokenizer for lines read from this document.
`static java.lang.String`	`tokenizerDelim` StringTokenizer delim for tokenizing only alphabetic strings.

Fields inherited from class ir.vsr.FileDocument
file, reader

Fields inherited from class ir.vsr.Document
nextToken, numStopWords, numTokens, stem, stemmer, stopWords, stopWordsFile

Constructor Summary

Constructors
Constructor and Description
`TextFileDocument(java.io.File file, boolean stem)` Create a new text document for the given file.
`TextFileDocument(java.lang.String fileName, boolean stem)` Create a new text document for the given file name.

Method Summary

Methods
Modifier and Type	Method and Description
`protected java.lang.String`	`getNextCandidateToken()` Return the next purely alpha-character token in the document, or null if none left.
`static void`	`main(java.lang.String[] args)` For testing, print the bag-of-words vector for a given file

Methods inherited from class ir.vsr.Document
allLetters, hashMapVector, hasMoreTokens, loadStopWords, nextToken, numberOfTokens, prepareNextToken, printVector

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - tokenizerDelim
```
public static final java.lang.String tokenizerDelim
```
    StringTokenizer delim for tokenizing only alphabetic strings.
    
    See Also:
    Constant Field Values
  - tokenizer
```
protected java.util.StringTokenizer tokenizer
```
    The tokenizer for lines read from this document.
- Constructor Detail
  - TextFileDocument
```
public TextFileDocument(java.io.File file,
                boolean stem)
```
    Create a new text document for the given file.
  - TextFileDocument
```
public TextFileDocument(java.lang.String fileName,
                boolean stem)
```
    Create a new text document for the given file name.
- Method Detail
  - getNextCandidateToken
```
protected java.lang.String getNextCandidateToken()
```
    Return the next purely alpha-character token in the document, or null if none left.
    
    Specified by:
    
    getNextCandidateToken in class Document
  - main
```
public static void main(java.lang.String[] args)
                 throws java.io.IOException
```
    For testing, print the bag-of-words vector for a given file
    
    Throws:
    
    java.io.IOException

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method