Fork TermSuite on GitHub.

TermSuite is a Java UIMA-based toolbox for terminology extraction and multilingual term alignment.

Multiword and compound term detection, morphosyntactic analysis, term variant detection, term specificity computation, etc. See features

Good support:
French English Russian German Spanish
Partial support:
Danish Chinese Latvian

Current version of TermSuite is 2.2 See Changelog

Get it running !

Prepare your system for TermSuite, download, install and get it running on an example corpus quickly.

Getting Started

Documentation

List of all TermSuite's features, analysis engines, and configuration parameters. Java API.

User Manual Javadoc

Developers

Build it from sources with Gradle, or use it as a maven dependency.

View on GithubMaven / Gradle

Features Overview

Word tokenization
POS Tagging (3rd party: with TreeTagger or Mate)
Lemmatization (3rd party: with TreeTagger or Mate)
Stemming (Snowball)
Terminology extraction
Efficient multiword term detection
Term syntactic variants detection
Term graphic variants detection
Term semantic variants detection (to come in 3.0)
Term morphology extraction
Term morphosyntactic variants detection
Term specificity (Weirdness Ratio) computing and other term measures: WR log, term frequency, etc
Term alignment (distributional and compositional, multilingual and monolingual)
Terminology export in multiple formats: `json`, `tsv`, `tbx`