|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object edu.northwestern.at.utils.corpuslinguistics.stemmer.LancasterStemmer
public class LancasterStemmer
LancasterStemmer: Implements the Lancaster (Paice/Husk) word stemmer.
Paice/Husk Stemmer - License Statement.
This software was designed and developed at Lancaster University, Lancaster, UK, under the supervision of Dr Chris Paice. It is fully in the public domain, and may be used or adapted by any organisation or individual. Neither Dr Paice nor Lancaster University accepts any responsibility whatsoever for its use by other parties, and makes no guarantees, expressed or implied, about its quality, reliability, or any other characteristic.
It is assumed that, as a matter of professional courtesy, anyone who incorporates this software into a system of their own, whether for commercial or research purposes, will acknowledge the source of the code.
Modified from the original Java programs written by Christopher O'Neill and Rob Hooper for use in WordHoard.
Field Summary | |
---|---|
static java.lang.String[] |
defaultStemmingRules
Default stemming rules. |
static java.lang.String[] |
prefixes
Prefixes to remove from words before stemming. |
protected boolean |
preStrip
|
protected java.util.Vector |
ruleTable
|
protected int[] |
ruleTableIndex
|
protected static char |
zeroDigit
Character for "0" digit. |
Constructor Summary | |
---|---|
LancasterStemmer()
Create a Paice/Husk stemmer using the default stemming rules. |
|
LancasterStemmer(java.lang.String[] rules)
Create a Paice/Husk stemmer from a string list of rules. |
|
LancasterStemmer(java.lang.String[] rules,
boolean preStrip)
Create a Paice/Husk stemmer from a string list of rules. |
Method Summary | |
---|---|
protected int |
charCode(char ch)
Converts a lower case letter to an index. |
protected java.lang.String |
clean(java.lang.String s)
Remove non-letters from a string. |
protected int |
firstVowel(java.lang.String s,
int last)
Returns index of first vowel in string. |
protected boolean |
isDigit(char ch)
Determine if character is a digit. |
protected boolean |
isLetter(char ch)
Determine if character is a letter. |
protected boolean |
isVowel(char ch)
Determine if character is a vowel or not. |
protected void |
loadRules(java.lang.String[] rules)
Loads the stemming rules. |
java.lang.String |
stem(java.lang.String s)
Stem a specified string. |
protected java.lang.String |
stripPrefixes(java.lang.String s)
Removes prefixes from a string. |
protected java.lang.String |
stripSuffixes(java.lang.String s)
Strip suffixes from a string. |
protected boolean |
vowel(char ch,
char prev)
Determine if character is a vowel or not. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final java.lang.String[] prefixes
public static final java.lang.String[] defaultStemmingRules
These rules MUST be stored in ascending alphanumeric order of the first character.
protected static final char zeroDigit
protected java.util.Vector ruleTable
protected int[] ruleTableIndex
protected boolean preStrip
Constructor Detail |
---|
public LancasterStemmer()
StemmerException
- if something goes wrong.
Prefixes are automatically removed from words with more than two characters.
public LancasterStemmer(java.lang.String[] rules)
rules
- The stemming rules as an array of String.
Prefixes are automatically removed from words with more than two characters.
public LancasterStemmer(java.lang.String[] rules, boolean preStrip)
rules
- The stemming rules as an array of String.preStrip
- True to remove prefixes from words with
more than two characters.
Prefixes are automatically removed from words with more than two characters.
Method Detail |
---|
protected void loadRules(java.lang.String[] rules)
rules
- String array of rules. protected int firstVowel(java.lang.String s, int last)
s
- String to search for vowel.last
- Last position to search for vowel.
protected java.lang.String stripSuffixes(java.lang.String s)
s
- The string from which to remove suffixes.
protected boolean isVowel(char ch)
ch
- The potential vowel.
protected boolean vowel(char ch, char prev)
ch
- The potential vowel.prev
- The previous character.
When the character is a "y", the previous character is checked to see if it is a vowel. If so, "y" is not considered a vowel.
protected boolean isDigit(char ch)
ch
- The character to check.
protected boolean isLetter(char ch)
ch
- The character to check.
protected int charCode(char ch)
ch
- The character. Must be in the range 'a' .. 'z'.
protected java.lang.String stripPrefixes(java.lang.String s)
s
- The string from which to remove prefixes.
protected java.lang.String clean(java.lang.String s)
s
- String from which to remove non-letters.
public java.lang.String stem(java.lang.String s)
stem
in interface Stemmer
s
- The string to stem.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |