|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.northwestern.at.utils.corpuslinguistics.WordCountExtractor
public class WordCountExtractor
Counts words in a text.
Field Summary | |
---|---|
protected java.lang.String[] |
uniqueWords
String array of unique words. |
protected java.util.TreeMap |
wordCounts
The list of words and word counts in the text. |
(package private) java.lang.String[] |
words
The text parsed into a string array of words. |
Constructor Summary | |
---|---|
WordCountExtractor(java.util.ArrayList wordList)
Extract word counts from an arraylist of words. |
|
WordCountExtractor(java.lang.String[] words)
Extract word counts from a string array of words. |
|
WordCountExtractor(java.lang.String fileName,
java.lang.String encoding)
Extract word counts from a text file. |
Method Summary | |
---|---|
protected void |
generateWordCountExtractor()
Compute word counts from a string array of words. |
int |
getNumberOfUniqueWords()
Return the number of unique words. |
int |
getNumberOfWords()
Return the total number of words. |
java.lang.String[] |
getUniqueWords()
Return unique words as a string array. |
int |
getWordCount(java.lang.String word)
Return count for a specific word. |
java.util.Map |
getWordCounts()
Return word count map. |
java.lang.String[] |
getWords()
Return tokenized text words as a string array. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected java.util.TreeMap wordCounts
Key=word
Value=Integer(count)
java.lang.String[] words
protected java.lang.String[] uniqueWords
Constructor Detail |
---|
public WordCountExtractor(java.lang.String[] words)
words
- The string array with the words. public WordCountExtractor(java.util.ArrayList wordList)
wordList
- The arraylist with the words. public WordCountExtractor(java.lang.String fileName, java.lang.String encoding)
fileName
- The file containing the text to analyze.encoding
- The encoding for the text file (.e.g, "utf-8"). Method Detail |
---|
protected void generateWordCountExtractor()
public java.lang.String[] getWords()
public int getNumberOfWords()
public java.lang.String[] getUniqueWords()
public int getNumberOfUniqueWords()
public int getWordCount(java.lang.String word)
word
- The word whose count is desired.
public java.util.Map getWordCounts()
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |