Fork TermSuite on GitHub.

TermSuite is a Java UIMA-based toolbox for terminology extraction and multilingual term alignment.

Multiword and compound term detection, morphosyntactic analysis, term variant detection, term specificity computation, etc. See features

Good support:
French English Russian German Spanish
Partial support:
Danish Chinese Latvian

Current version of TermSuite is 2.3.1 See Changelog

Get it running !

Prepare your system for TermSuite, download, install and get it running on an example corpus quickly.

Getting Started


List of all TermSuite's features, analysis engines, and configuration parameters. Java API.

User Manual Javadoc


Build it from sources with Gradle, or use it as a maven dependency.

View on GithubMaven / Gradle




Damien Cram and Béatrice Daille.
Terminology Extraction with Term Variant Detection.
Proceedings of ACL-2016 System Demonstrations.


Jérôme Rocheteau and Béatrice Daille.
TTC TermSuite: A UIMA Application for Multilingual Terminology Extraction from Comparable Corpora.
Proceedings of the 5th International Joint Conference on Natural Language Processing.

Features Overview

Word tokenization
POS Tagging (3rd party: with TreeTagger or Mate)
Lemmatization (3rd party: with TreeTagger or Mate)
Stemming (Snowball)
Terminology extraction
Efficient multiword term detection
Term morphology extraction
Term syntactic variants detection
Term graphic variants detection
Variant detection based on term derivations and term prefixation
Term semantic variants detection (to come in 2.4)
Term morphosyntactic variants detection
Term specificity (Weirdness Ratio) computing and other term measures: WR log, term frequency, etc
Term alignment (distributional and compositional, multilingual and monolingual)
Terminology export in multiple formats: `json`, `tsv`, `tbx`