|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.northwestern.at.wordhoard.swing.calculator.analysis.FrequencyAnalysisRunnerBase
edu.northwestern.at.wordhoard.swing.calculator.analysis.FindMultiwordUnits
public class FindMultiwordUnits
Find multiword units.
Field Summary | |
---|---|
protected int |
accepted
Count # of mwus accepted. |
protected int |
acceptedByLocalMaxs
Count of mwus accepted by localmaxs algorithm. |
protected static int |
DICECOLUMN
|
protected static int |
LOGLIKECOLUMN
|
protected static int |
MWUCOUNTCOLUMN
|
protected static int |
MWULENGTHCOLUMN
|
protected int |
mwusToReportOn
Count of mwus to report on. |
protected static int |
MWUTEXTCOLUMN
Output column indices. |
protected int |
onceOnly
Count # of mwus which occur only once. |
protected static int |
PHISQUAREDCOLUMN
|
protected int |
rejected
Count # of mwus rejected by filters. |
protected int |
rejectedByWordClassFilters
Count of mwus rejected by word class filters. |
protected static int |
SCPCOLUMN
|
protected static int |
SICOLUMN
|
protected int |
sortColumn
The column containing the association measure to use. |
protected static int |
WORDCLASSESCOLUMN
|
Constructor Summary | |
---|---|
FindMultiwordUnits()
Create a multiple word form frequency profile object. |
Method Summary | |
---|---|
boolean |
areResultOptionsAvailable()
Are result options available? |
protected java.util.Collection |
createRawMWUs(WordCountExtractor wordExtractor,
NGramExtractor[] extractors)
Create raw (unfiltered) multiword unit strings. |
protected java.lang.String[] |
extractLemmata(java.util.List workWords)
Extract lemmata from retrieved data. |
protected java.lang.String[] |
extractSpellings(java.util.List workWords)
Extract spellings from retrieved data. |
protected java.lang.String[] |
filterMultiwordUnits(java.util.List mwuCountData,
java.util.HashMap glueMap,
java.util.Map wordCountMap,
NGramExtractor[] extractors,
SortedTableModel model)
Filter the raw multiword units. |
protected java.lang.String |
fixMWUText(java.lang.String mwuText)
Fix multiword unit text for display. |
protected ResultsPanel |
generateResults(WordHoardSortedTableModel model,
java.lang.String[] maxLabels,
int sortColumn,
int totalWordCount)
Displays results of multiword unit extraction in a sorted table. |
ResultsPanel |
getCloud()
Show tag cloud of Dunning's log-likelihood profile. |
protected double |
getGlue(java.lang.String mwuText,
java.util.Map glueMap)
Get "glue" value for a multiword unit. |
LabeledColumn |
getResultOptions()
Return result options. |
boolean |
isCloudAvailable()
Is cloud output available? |
protected boolean |
isMWU(MultiwordUnitData countData,
java.util.Map glueMap)
Determine if multiword unit is a phrase using localmaxs. |
protected boolean |
passesBigramFilter(java.lang.String[] wordClasses)
Filter bigrams by word class. |
protected boolean |
passesTrigramFilter(java.lang.String[] wordClasses)
Filter trigrams by word class. |
protected boolean |
passesVerbFilter(java.lang.String[] wordClasses)
Filter ngrams containing verbs. |
boolean |
passesWordClassFilters(java.lang.String[] words)
Filter multiword units using major word class. |
protected java.util.List |
retrieveLemmata(Work work)
Perform query and get lemmata for selected work(s). |
protected java.util.List |
retrieveSpellings(Work work)
Perform query and get spellings for selected work(s). |
void |
runAnalysis(javax.swing.JFrame parentWindow,
ProgressReporter progressReporter)
Run an analysis. |
protected java.lang.Object[] |
storeMWUData(java.util.Collection mwusList,
java.util.Map wordCountMap,
int totalWordCount,
NGramExtractor[] extractors)
Store multiword unit data. |
Methods inherited from class edu.northwestern.at.wordhoard.swing.calculator.analysis.FrequencyAnalysisRunnerBase |
---|
closeProgressReporter, createCloudAssociationMeasuresComboBox, createCompressValueRangeInTagCloudsCheckBox, generateResults, getAnalysisPercentColumnName, getChart, getCloud, getColTitleWordFormString, getContext, getDoubleFormat, getPercentReportMethodFormat, getReferencePercentColumnName, getResults, getTableFontSize, getTitle, handleTableSelectionChange, isCancelled, isChartAvailable, isContextAvailable, isFilterAvailable, saveChart, setContextButton, showDialog |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface edu.northwestern.at.wordhoard.swing.calculator.analysis.AnalysisRunner |
---|
getChart, getContext, getResults, handleTableSelectionChange, isChartAvailable, isContextAvailable, isFilterAvailable, saveChart, setContextButton, showDialog |
Field Detail |
---|
protected static final int MWUTEXTCOLUMN
protected static final int WORDCLASSESCOLUMN
protected static final int MWULENGTHCOLUMN
protected static final int MWUCOUNTCOLUMN
protected static final int DICECOLUMN
protected static final int LOGLIKECOLUMN
protected static final int PHISQUAREDCOLUMN
protected static final int SICOLUMN
protected static final int SCPCOLUMN
protected int accepted
protected int rejected
protected int onceOnly
protected int mwusToReportOn
protected int rejectedByWordClassFilters
protected int acceptedByLocalMaxs
protected int sortColumn
Constructor Detail |
---|
public FindMultiwordUnits()
Method Detail |
---|
public void runAnalysis(javax.swing.JFrame parentWindow, ProgressReporter progressReporter)
runAnalysis
in interface AnalysisRunner
runAnalysis
in class FrequencyAnalysisRunnerBase
parentWindow
- Parent window for dialogs in the analysis.progressReporter
- Progress display for analysis. protected java.util.List retrieveSpellings(Work work)
work
- Work from which to retrieve words. protected java.util.List retrieveLemmata(Work work)
work
- Work from which to retrieve words. protected java.lang.String[] extractSpellings(java.util.List workWords)
workWords
- Retrieved words.
protected java.lang.String[] extractLemmata(java.util.List workWords)
workWords
- Retrieved words.
protected java.util.Collection createRawMWUs(WordCountExtractor wordExtractor, NGramExtractor[] extractors)
extractors
- The NGramExtractors to receive the raw
multiword unit strings.
protected java.lang.Object[] storeMWUData(java.util.Collection mwusList, java.util.Map wordCountMap, int totalWordCount, NGramExtractor[] extractors)
mwusList
- Collection of all raw multiword units.wordCountMap
- Map containing words and keys and
counts as value for all words in the
multiword units.totalWordCount
- Total word count in word count map.extractors
- The NGramExtractors holding the counts for
the raw multiword unit strings.
protected java.lang.String[] filterMultiwordUnits(java.util.List mwuCountData, java.util.HashMap glueMap, java.util.Map wordCountMap, NGramExtractor[] extractors, SortedTableModel model)
mwuCountData
- The list of multiword unit count data.glueMap
- Hash map of mwus to glue association measures.wordCountMap
- Word count map.extractors
- Extractors holding mwu count data.model
- Table model in which to store filtered mwus.
protected java.lang.String fixMWUText(java.lang.String mwuText)
mwuText
- The multiword unit text to fix.
protected boolean isMWU(MultiwordUnitData countData, java.util.Map glueMap)
countData
- The multiword unit data.glueMap
- The glue map for all multiword units.
protected double getGlue(java.lang.String mwuText, java.util.Map glueMap)
mwuText
- The multiword unit text.glueMap
- The map from multiword units to glue values.
protected boolean passesBigramFilter(java.lang.String[] wordClasses)
wordClasses
- Major word classes for each word in bigram.
The bigram filters are those suggested by Justeson and Katz.
A = adjective
N = noun
protected boolean passesTrigramFilter(java.lang.String[] wordClasses)
wordClasses
- Major word classes for words comprising
trigram.
The trigram filters are those suggested by Justeson and Katz.
To this we add, for trigrams:
A = adjective
N = noun
P = preposition
C = conjunction
protected boolean passesVerbFilter(java.lang.String[] wordClasses)
wordClasses
- Major word classes for each word in ngram.
The ngram is filtered if any of the constiuent words is a verb.
public boolean passesWordClassFilters(java.lang.String[] words)
words
- Major word class for each word in the ngram.
The verb filter removes all multiword units containing a verb.
protected ResultsPanel generateResults(WordHoardSortedTableModel model, java.lang.String[] maxLabels, int sortColumn, int totalWordCount)
model
- Table model holding data to display.maxLabels
- Maximum width value for initial table
columns.sortColumn
- Column on which to sort table.totalWordCount
- Total number of words.
public boolean isCloudAvailable()
isCloudAvailable
in interface AnalysisRunner
isCloudAvailable
in class FrequencyAnalysisRunnerBase
public boolean areResultOptionsAvailable()
areResultOptionsAvailable
in interface AnalysisRunner
areResultOptionsAvailable
in class FrequencyAnalysisRunnerBase
public LabeledColumn getResultOptions()
getResultOptions
in interface AnalysisRunner
getResultOptions
in class FrequencyAnalysisRunnerBase
public ResultsPanel getCloud()
getCloud
in interface AnalysisRunner
getCloud
in class FrequencyAnalysisRunnerBase
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |