|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object edu.northwestern.at.utils.swing.FileTokenizer
public class FileTokenizer
Tokenizes text from a text file.
A token is defined as text between word separator characters. The separator characters are defined below in the WORD_SEPARATOR_CHARACTERS array. The tokenizer keeps track of the starting and ending position of each token. This is necessary to support find/replace, spell checking, etc.
FileTokenizer implements the Iterator interface, but the optional remove() method is left as a no-op since there is no collection underlying this class.
Example:
// Tokenize file text and print out list of words.
FileTokenizer tokenizer =
new FileTokenizer( "myfile.txt" );
// While there are more characters
// we haven't looked at ...
while ( tokenizer.hasNext() )
{
// Extract next word in document.
String word = tokenizer.next();
// Print out word and its starting and
// ending positions in the document text.
System.out.println(
word +
" starts at " + tokenizer.getStartPos() +
", ends at " + tokenizer.getEndPos() );
}
Field Summary | |
---|---|
protected int |
currentPos
Current position in document. |
protected javax.swing.text.Document |
document
The document to tokenize. |
protected int |
endPos
Ending position in document. |
protected javax.swing.text.Segment |
segment
The current document segment. |
protected static java.util.HashMap |
separatorHashMap
Hash holds separator characters for quick access. |
protected int |
startPos
Starting position in document. |
static char[] |
WORD_SEPARATOR_CHARACTERS
Characters that separate words. |
static java.lang.String |
WORD_SEPARATOR_CHARACTERS_STRING
|
Constructor Summary | |
---|---|
FileTokenizer(java.lang.String textFileName)
Create document tokenizer. |
Method Summary | |
---|---|
protected static void |
createSeparatorHashMap()
Creates word separator hash map from list of separator characters. |
int |
getEndPos()
Get ending position in document for tokenization. |
int |
getStartPos()
Get starting position in document for tokenization. |
boolean |
hasNext()
Check if more characters available in document. |
static boolean |
isSeparator(char ch)
Checks if a character is a word separator. |
void |
moveToStartOfWord()
Move to start of next word if current cursor is in the middle of a word. |
java.lang.Object |
next()
Get next token in document. |
void |
remove()
Removes last element returned by iterator (does nothing). |
void |
setPosition(int pos)
Set position in document for tokenization. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected javax.swing.text.Document document
protected javax.swing.text.Segment segment
protected int startPos
protected int endPos
protected int currentPos
public static final char[] WORD_SEPARATOR_CHARACTERS
The single quote is not included as a word separator so that contractions can be picked up. It is up to the invoker to remove unwanted single quotes from a token. Likewise a "-" is not considered a separator so that words containing a dash can be worked with.
public static final java.lang.String WORD_SEPARATOR_CHARACTERS_STRING
protected static java.util.HashMap separatorHashMap
Constructor Detail |
---|
public FileTokenizer(java.lang.String textFileName) throws java.io.IOException, javax.swing.text.BadLocationException
textFileName
- Name of text file to tokenize.
java.io.IOException
javax.swing.text.BadLocationException
Method Detail |
---|
public static boolean isSeparator(char ch)
ch
- The character to check.
Tests is a character is a separator by checking if the character is a key in the separatorHaspMap map. If so, the character is a separator.
protected static void createSeparatorHashMap()
The separatorHashMap map uses each separator character as both a key and the key's value.
public void moveToStartOfWord()
public boolean hasNext()
hasNext
in interface java.util.Iterator
public java.lang.Object next()
next
in interface java.util.Iterator
public void remove()
remove
in interface java.util.Iterator
public int getStartPos()
public int getEndPos()
public void setPosition(int pos)
pos
- The position.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |