Fork TermSuite on GitHub.

Term properties

rank [Integer]

The rank of the term assigned by TermSuite post-processor engine.

isSingleWord, isSwt [Boolean]

Wether this term is single-word or not.

documentFrequency, dFreq [Integer]

The number of documents in corpus in which the term is occurring.

frequencyNorm, fNorm [Double]

The number of occurrences of the term in the corpus every 1000 words.

generalFrequencyNorm, gfNorm [Double]

The number of occurrences of the term in the general language corpus every 1000 words.

specificity, spec [Double]

The weirdness ratio, i.e. the specificity of the term in the corpus in comparison to general language.

frequency, freq [Integer]

The number of occurrences of the term in the corpus.

OrthographicScore, ortho [Double]

The probability for the covered text of the term for being an actual term assigned by TermSuite post-processor engine.

IndependantFrequency, iFreq [Integer]

The number of times a term occurrs in corpus as it is, i.e. not as any of its variant forms, assigned by TermSuite post-processor engine.

Independance, ind [Double]

The IndependantFrequency divided by frequency, assigned by TermSuite post-processor engine.

pilot [String]

The most frequent form of the term.

lemma [String]

The concatenation of the term’s word lemmas.

tf-idf, tfIdf [Double]

frequency divided by DOCUMENT_FREQUENCY.

spec-idf, specIdf [Double]

specificity divided by DOCUMENT_FREQUENCY.

groupingKey, key [String]

The unique id of the term, built on its pattern and its lemma.

pattern [String]

The pattern of the term, i.e. the concatenation of syntactic labels of its words.

spottingRule, rule [String]

The name of the UIMA Tokens Regex spotting rule that found the term in the corpus.

isFixedExpression, isFixedExp [Boolean]

Wether the term is a fixed expression.

SwtSize, swtSize [Integer]

The number of words composing the term that are single-words.

Filtered, isFiltered [Boolean]

Wether the term has been marked as filtered by TermSuite post-processor engine. Usually, such a term is not meant to be displayed.

Depth, depth [Integer]

The minimum level of extensions of the term starting from a single-word term.

Relation properties

VariationRank, vRank [Integer]

The rank of the variation among all variations starting from the same source term, when the relation is a variation.

VariationRule, vRules [Set]

The set of YAML variation rules that detected this pair of terms as a term variation, when the relation is a variation.

DerivationType, derivType [String]

The derivation type of the variation, when the relation is a variation.

GraphSimilarity, graphSim [Double]

The edition distance between the two terms of the relation.

Score, vScore [Double]

The global variation score of the relation assigned by TermSuite post-processor engine, when the relation if a variation.

AffixGain, affGain [Double]

When the relation is a variation of type “extension”, the FREQUENCY of the variant divided by the FREQUENCY of the affix term.

AffixSpec, affSpec [Double]

When the relation is a variation of type “extension”, the SPECIFICITY of the affix term.

AffixRatio, affRatio [Double]

When the relation is a variation of type “extension”, the FREQUENCY of the affix term divided by the FREQUENCY of the base term.

AffixScore, affScore [Double]

When the relation is a variation of type “extension”, the weighted average of AFFIX_GAIN and AFFIX_RATIO.

NormalizedAffixScore, nAffScore [Double]

When the relation is a variation of type “extension”, the min-max normalization of AffixScore.

AffixOrthographicScore, affOrtho [Double]

When the relation is a variation of type “extension”, the orthographic score of extension affix term.

ExtensionScore, extScore [Double]

When the relation is a variation of type “extension”, the score of the extension affix term (combines AffixGain and AffixGain).

NormalizedExtensionScore, nExtScore [Double]

When the relation is a variation of type “extension”, the min-max normalization of ExtensionScore.

HasExtensionAffix, hasExtAffix [Boolean]

When the relation is a variation of type “extension”, wether there is an affix term.

IsExtension, isExt [Boolean]

Wether this relation is an extension.

VariantBagFrequency, vBagFreq [Integer]

When the relation is a variation, the total of number of occurrences of the variant term and of variant’s variant terms (order-2 variants).

SourceGain, srcGain [Double]

When the relation is a variation, the log10 of VariantBagFrequency divided by the FREQUENCY of the base term.

NormalizedSourceGain, nSrcGain [Double]

When the relation is a variation of type “extension”, the linear normalization of SourceGain.

IsInfered, isInfered [Boolean]

When the relation is a variation, wether it has been infered from two other base variations.

IsGraphical, isGraph [Boolean]

When the relation is a variation, wether there is a graphical similarity between the two terms.

IsDerivation, isDeriv [Boolean]

When the relation is a variation, wether one term is the derivation of the other.

IsPrefixation, isPref [Boolean]

When the relation is a variation, wether one term is the prefix of the other.

IsSyntagmatic, isSyntag [Boolean]

When the relation is a variation, wether it is a syntagmatic variation.

IsMorphological, isMorph [Boolean]

When the relation is a variation, wether the variation implies morphosyntactic variations.

IsSemantic, isSem [Boolean]

When the relation is a variation, wether there is a semantic similarity between the two terms.

Distributional, isDistrib [Boolean]

When the relation is a semantic relation, wheter the relation is of type “distributional”, i.e. the variation has been found by context vector alignment.

SemanticSimilarity, semSim [Double]

When the relation is a semantic variation found by alignment, the similarity of the two context vectors of the two terms of the relation.

Dico, isDico [Boolean]

When the relation is a semantic relation, wheter the relation is of type “dictionary”, i.e. the variation has been found with a synonymic dico.

SemanticScore, semScore [Double]

When the relation is a semantic variation, the score of pertinency of the variation. This property is set for all types of semantic variations, both from dico and distributional.