Fork TermSuite on GitHub.

TermSuite is a Java UIMA-based toolbox for terminology extraction and multilingual term alignment.

Multiword and compound term detection, morphosyntactic analysis, term variant detection, term specificity computation, etc. See features

Good support:
French English Russian German Spanish
Partial support:
Danish Chinese Latvian

Current version of TermSuite is 2.3.3 See Changelog

Get it running !

Prepare your system for TermSuite, download, install and get it running on an example corpus quickly.

Getting Started

Documentation

List of all TermSuite's features, analysis engines, and configuration parameters. Java API.

User Manual Javadoc

Developers

Build it from sources with Gradle, or use it as a maven dependency.

View on GithubMaven / Gradle

academics

Publications

ACL2016

Damien Cram and Béatrice Daille.
Terminology Extraction with Term Variant Detection.
Proceedings of ACL-2016 System Demonstrations.
PDF

IJCNLP2011

Jérôme Rocheteau and Béatrice Daille.
TTC TermSuite: A UIMA Application for Multilingual Terminology Extraction from Comparable Corpora.
Proceedings of the 5th International Joint Conference on Natural Language Processing.
PDF


Features Overview

Word tokenization
POS Tagging (3rd party: with TreeTagger or Mate)
Lemmatization (3rd party: with TreeTagger or Mate)
Stemming (Snowball)
Terminology extraction
Efficient multiword term detection
Term morphology extraction
Term syntactic variants detection
Term graphic variants detection
Variant detection based on term derivations and term prefixation
Term semantic variants detection (to come in 2.4)
Term morphosyntactic variants detection
Term specificity (Weirdness Ratio) computing and other term measures: WR log, term frequency, etc
Term alignment (distributional and compositional, multilingual and monolingual)
Terminology export in multiple formats: `json`, `tsv`, `tbx`