FindMultiwordUnits (WordHoard)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

edu.northwestern.at.wordhoard.swing.calculator.analysis
Class FindMultiwordUnits

java.lang.Object
  edu.northwestern.at.wordhoard.swing.calculator.analysis.FrequencyAnalysisRunnerBase
      edu.northwestern.at.wordhoard.swing.calculator.analysis.FindMultiwordUnits

All Implemented Interfaces:: AnalysisRunner

public class FindMultiwordUnits
extends FrequencyAnalysisRunnerBase
implements AnalysisRunner
extends FrequencyAnalysisRunnerBase
implements AnalysisRunner

Find multiword units.

Field Summary
`protected int`	`accepted` Count # of mwus accepted.
`protected int`	`acceptedByLocalMaxs` Count of mwus accepted by localmaxs algorithm.
`protected static int`	`DICECOLUMN`
`protected static int`	`LOGLIKECOLUMN`
`protected static int`	`MWUCOUNTCOLUMN`
`protected static int`	`MWULENGTHCOLUMN`
`protected int`	`mwusToReportOn` Count of mwus to report on.
`protected static int`	`MWUTEXTCOLUMN` Output column indices.
`protected int`	`onceOnly` Count # of mwus which occur only once.
`protected static int`	`PHISQUAREDCOLUMN`
`protected int`	`rejected` Count # of mwus rejected by filters.
`protected int`	`rejectedByWordClassFilters` Count of mwus rejected by word class filters.
`protected static int`	`SCPCOLUMN`
`protected static int`	`SICOLUMN`
`protected int`	`sortColumn` The column containing the association measure to use.
`protected static int`	`WORDCLASSESCOLUMN`

Fields inherited from class edu.northwestern.at.wordhoard.swing.calculator.analysis.FrequencyAnalysisRunnerBase
adjustChiSquareForMultipleComparisons, analysisText, analysisTextBreakdownBy, analyzePhraseFrequencies, associationMeasure, blankReplacementCharacter, collocationOccurrenceMap, colorCodeOveruseColumn, compressValueRangeInTagClouds, contextButton, cutoff, displayProgress, filterBigramsByWordClass, filterMultiwordUnitsContainingVerbs, filterOutProperNames, filterSingleOccurrences, filterTrigramsByWordClass, filterUsingLocalMaxs, FONT_SIZE, frequencyAnalysisType, frequencyNormalizationMethod, FrequencyProfileResults, ignoreCaseAndDiacriticalMarks, leftSpan, markSignificantLogLikelihoodValues, maximumMultiwordUnitLength, minimumCount, minimumMultiwordUnitLength, minimumWorkCount, model, percentReportMethod, pluralWordFormString, progressReporter, referenceText, referenceTextBreakdownBy, resultsPanel, resultsScrollPane, resultsTable, rightSpan, roundNormalizedFrequencies, showPhraseFrequencies, showWordClasses, tableSelectionListener, useShortWorkTitlesInDialogs, useShortWorkTitlesInHeaders, useShortWorkTitlesInOutput, useShortWorkTitlesInWindowTitles, wordForm, wordFormString, wordOccs, wordToAnalyze

Fields inherited from class edu.northwestern.at.wordhoard.swing.calculator.analysis.FrequencyAnalysisRunnerBase

adjustChiSquareForMultipleComparisons, analysisText, analysisTextBreakdownBy, analyzePhraseFrequencies, associationMeasure, blankReplacementCharacter, collocationOccurrenceMap, colorCodeOveruseColumn, compressValueRangeInTagClouds, contextButton, cutoff, displayProgress, filterBigramsByWordClass, filterMultiwordUnitsContainingVerbs, filterOutProperNames, filterSingleOccurrences, filterTrigramsByWordClass, filterUsingLocalMaxs, FONT_SIZE, frequencyAnalysisType, frequencyNormalizationMethod, FrequencyProfileResults, ignoreCaseAndDiacriticalMarks, leftSpan, markSignificantLogLikelihoodValues, maximumMultiwordUnitLength, minimumCount, minimumMultiwordUnitLength, minimumWorkCount, model, percentReportMethod, pluralWordFormString, progressReporter, referenceText, referenceTextBreakdownBy, resultsPanel, resultsScrollPane, resultsTable, rightSpan, roundNormalizedFrequencies, showPhraseFrequencies, showWordClasses, tableSelectionListener, useShortWorkTitlesInDialogs, useShortWorkTitlesInHeaders, useShortWorkTitlesInOutput, useShortWorkTitlesInWindowTitles, wordForm, wordFormString, wordOccs, wordToAnalyze

Constructor Summary
`FindMultiwordUnits()` Create a multiple word form frequency profile object.

Method Summary
`boolean`	`areResultOptionsAvailable()` Are result options available?
`protected java.util.Collection`	`createRawMWUs(WordCountExtractor wordExtractor, NGramExtractor[] extractors)` Create raw (unfiltered) multiword unit strings.
`protected java.lang.String[]`	`extractLemmata(java.util.List workWords)` Extract lemmata from retrieved data.
`protected java.lang.String[]`	`extractSpellings(java.util.List workWords)` Extract spellings from retrieved data.
`protected java.lang.String[]`	`filterMultiwordUnits(java.util.List mwuCountData, java.util.HashMap glueMap, java.util.Map wordCountMap, NGramExtractor[] extractors, SortedTableModel model)` Filter the raw multiword units.
`protected java.lang.String`	`fixMWUText(java.lang.String mwuText)` Fix multiword unit text for display.
`protected ResultsPanel`	`generateResults(WordHoardSortedTableModel model, java.lang.String[] maxLabels, int sortColumn, int totalWordCount)` Displays results of multiword unit extraction in a sorted table.
`ResultsPanel`	`getCloud()` Show tag cloud of Dunning's log-likelihood profile.
`protected double`	`getGlue(java.lang.String mwuText, java.util.Map glueMap)` Get "glue" value for a multiword unit.
`LabeledColumn`	`getResultOptions()` Return result options.
`boolean`	`isCloudAvailable()` Is cloud output available?
`protected boolean`	`isMWU(MultiwordUnitData countData, java.util.Map glueMap)` Determine if multiword unit is a phrase using localmaxs.
`protected boolean`	`passesBigramFilter(java.lang.String[] wordClasses)` Filter bigrams by word class.
`protected boolean`	`passesTrigramFilter(java.lang.String[] wordClasses)` Filter trigrams by word class.
`protected boolean`	`passesVerbFilter(java.lang.String[] wordClasses)` Filter ngrams containing verbs.
`boolean`	`passesWordClassFilters(java.lang.String[] words)` Filter multiword units using major word class.
`protected java.util.List`	`retrieveLemmata(Work work)` Perform query and get lemmata for selected work(s).
`protected java.util.List`	`retrieveSpellings(Work work)` Perform query and get spellings for selected work(s).
`void`	`runAnalysis(javax.swing.JFrame parentWindow, ProgressReporter progressReporter)` Run an analysis.
`protected java.lang.Object[]`	`storeMWUData(java.util.Collection mwusList, java.util.Map wordCountMap, int totalWordCount, NGramExtractor[] extractors)` Store multiword unit data.

Methods inherited from class edu.northwestern.at.wordhoard.swing.calculator.analysis.FrequencyAnalysisRunnerBase
`closeProgressReporter, createCloudAssociationMeasuresComboBox, createCompressValueRangeInTagCloudsCheckBox, generateResults, getAnalysisPercentColumnName, getChart, getCloud, getColTitleWordFormString, getContext, getDoubleFormat, getPercentReportMethodFormat, getReferencePercentColumnName, getResults, getTableFontSize, getTitle, handleTableSelectionChange, isCancelled, isChartAvailable, isContextAvailable, isFilterAvailable, saveChart, setContextButton, showDialog`

Methods inherited from class edu.northwestern.at.wordhoard.swing.calculator.analysis.FrequencyAnalysisRunnerBase

closeProgressReporter, createCloudAssociationMeasuresComboBox, createCompressValueRangeInTagCloudsCheckBox, generateResults, getAnalysisPercentColumnName, getChart, getCloud, getColTitleWordFormString, getContext, getDoubleFormat, getPercentReportMethodFormat, getReferencePercentColumnName, getResults, getTableFontSize, getTitle, handleTableSelectionChange, isCancelled, isChartAvailable, isContextAvailable, isFilterAvailable, saveChart, setContextButton, showDialog

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Methods inherited from interface edu.northwestern.at.wordhoard.swing.calculator.analysis.AnalysisRunner
`getChart, getContext, getResults, handleTableSelectionChange, isChartAvailable, isContextAvailable, isFilterAvailable, saveChart, setContextButton, showDialog`

Field Detail

MWUTEXTCOLUMN

protected static final int MWUTEXTCOLUMN

Output column indices.

See Also:: Constant Field Values

WORDCLASSESCOLUMN

protected static final int WORDCLASSESCOLUMN

See Also:: Constant Field Values

MWULENGTHCOLUMN

protected static final int MWULENGTHCOLUMN

See Also:: Constant Field Values

MWUCOUNTCOLUMN

protected static final int MWUCOUNTCOLUMN

See Also:: Constant Field Values

DICECOLUMN

protected static final int DICECOLUMN

See Also:: Constant Field Values

LOGLIKECOLUMN

protected static final int LOGLIKECOLUMN

See Also:: Constant Field Values

PHISQUAREDCOLUMN

protected static final int PHISQUAREDCOLUMN

See Also:: Constant Field Values

SICOLUMN

protected static final int SICOLUMN

See Also:: Constant Field Values

SCPCOLUMN

protected static final int SCPCOLUMN

See Also:: Constant Field Values

accepted

protected int accepted

Count # of mwus accepted.

rejected

protected int rejected

Count # of mwus rejected by filters.

onceOnly

protected int onceOnly

Count # of mwus which occur only once.

mwusToReportOn

protected int mwusToReportOn

Count of mwus to report on.

rejectedByWordClassFilters

protected int rejectedByWordClassFilters

Count of mwus rejected by word class filters.

acceptedByLocalMaxs

protected int acceptedByLocalMaxs

Count of mwus accepted by localmaxs algorithm.

sortColumn

protected int sortColumn

The column containing the association measure to use.

Constructor Detail

FindMultiwordUnits

public FindMultiwordUnits()

Create a multiple word form frequency profile object.

Method Detail

runAnalysis

public void runAnalysis(javax.swing.JFrame parentWindow,
                        ProgressReporter progressReporter)

Run an analysis.

Specified by:: runAnalysis in interface AnalysisRunner
Overrides:: runAnalysis in class FrequencyAnalysisRunnerBase

Parameters:: parentWindow - Parent window for dialogs in the analysis.; progressReporter - Progress display for analysis.

retrieveSpellings

protected java.util.List retrieveSpellings(Work work)

Perform query and get spellings for selected work(s).

Parameters:: work - Work from which to retrieve words.

retrieveLemmata

protected java.util.List retrieveLemmata(Work work)

Perform query and get lemmata for selected work(s).

Parameters:: work - Work from which to retrieve words.

extractSpellings

protected java.lang.String[] extractSpellings(java.util.List workWords)

Extract spellings from retrieved data.

Parameters:: workWords - Retrieved words.
Returns:: String array of spellings suitable for counting.

extractLemmata

protected java.lang.String[] extractLemmata(java.util.List workWords)

Extract lemmata from retrieved data.

Parameters:: workWords - Retrieved words.
Returns:: String array of lemmata suitable for counting.

createRawMWUs

protected java.util.Collection createRawMWUs(WordCountExtractor wordExtractor,
                                             NGramExtractor[] extractors)

Create raw (unfiltered) multiword unit strings.

Parameters:: extractors - The NGramExtractors to receive the raw multiword unit strings.
Returns:: List of all raw multiword units to analyze.

storeMWUData

protected java.lang.Object[] storeMWUData(java.util.Collection mwusList,
                                          java.util.Map wordCountMap,
                                          int totalWordCount,
                                          NGramExtractor[] extractors)

Store multiword unit data.

Parameters:: mwusList - Collection of all raw multiword units.; wordCountMap - Map containing words and keys and counts as value for all words in the multiword units.; totalWordCount - Total word count in word count map.; extractors - The NGramExtractors holding the counts for the raw multiword unit strings.
Returns:: Two item array. [0] = list of all multiword unit count data items. [1] = hash map mapping mwu to selected association measure for use by localmaxs.

filterMultiwordUnits

protected java.lang.String[] filterMultiwordUnits(java.util.List mwuCountData,
                                                  java.util.HashMap glueMap,
                                                  java.util.Map wordCountMap,
                                                  NGramExtractor[] extractors,
                                                  SortedTableModel model)

Filter the raw multiword units.

Parameters:: mwuCountData - The list of multiword unit count data.; glueMap - Hash map of mwus to glue association measures.; wordCountMap - Word count map.; extractors - Extractors holding mwu count data.; model - Table model in which to store filtered mwus.
Returns:: Longest mwu string in table.

fixMWUText

protected java.lang.String fixMWUText(java.lang.String mwuText)

Fix multiword unit text for display.

Parameters:: mwuText - The multiword unit text to fix.
Returns:: The multiword unit text suitable for display.

isMWU

protected boolean isMWU(MultiwordUnitData countData,
                        java.util.Map glueMap)

Determine if multiword unit is a phrase using localmaxs.

Parameters:: countData - The multiword unit data.; glueMap - The glue map for all multiword units.
Returns:: true if multiword unit appears to be a phrase.

getGlue

protected double getGlue(java.lang.String mwuText,
                         java.util.Map glueMap)

Get "glue" value for a multiword unit.

Parameters:: mwuText - The multiword unit text.; glueMap - The map from multiword units to glue values.
Returns:: The glue value for the given multiword unit. Returns 0 if mwu not found.

passesBigramFilter

protected boolean passesBigramFilter(java.lang.String[] wordClasses)

Filter bigrams by word class.

Parameters:

wordClasses - Major word classes for each word in bigram.

The bigram filters are those suggested by Justeson and Katz.

A = adjective
N = noun

passesTrigramFilter

protected boolean passesTrigramFilter(java.lang.String[] wordClasses)

Filter trigrams by word class.