TermSuite is a toolbox for terminology extraction and multilingual term alignment.
Multiword and compound term detection, morphosyntactic analysis, term variant detection, term specificity computation, etc. See features
Current version of TermSuite is 3.0.10 See Changelog
Prepare your system for TermSuite, download, install and get it running on an example corpus quickly.
List of all TermSuite's features, analysis engines, and configuration parameters. Java API.
Build it from sources with Gradle, or use it as a maven dependency.
$ java -cp termsuite-core-3.0.10.jar \
fr.univnantes.termsuite.tools.TerminologyExtractorCLI \
-t /path/to/treetagger/ \
-c /path/to/corpus/ \
-l en \
--tsv my-termino.tsv \
NLP preprocessing, terminology extraction, and multilingual alignment from command line. Getting Started [using Docker]
Embed TermSuite in your Java application and get access to all functionalities and full control.
// Create the corpus object
TXTCorpus corpus = new TXTCorpus(
Lang.EN,
Paths.get("wind-energy", "documents"));
// Do the NLP preprocessings
IndexedCorpus corpus = TermSuite.preprocessor()
.setTaggerPath(Paths.get("path", "to", "treetagger"))
.toIndexedCorpus(corpus, 500000);
// Extract the terminology
TermSuite.terminoExtractor().execute(corpus);
// Keep only top 1000 terms by specificity with their variants
TermSuite.terminologyFilterer()
.by(TermProperty.SPECIFICITY)
.keepTopN(1000).keepVariants()
.filter(corpus);
// Export the terminology to TSV
TermSuiteFactory.createTsvExporter(new TsvOptions())
.export(corpus, Paths.get("my-termino.tsv"));
A graphical user interface for terminology extraction, multilingual alignment, in-context term occurrence viewer, linguistic resource editor.
(click here if the link above fails)
ACL2016 |
Damien Cram and Béatrice Daille. |
IJCNLP2011 |
Jérôme Rocheteau and Béatrice Daille. |
BENJAMINS2017 |
Béatrice Daille. |
Word tokenization | |
POS Tagging (3rd party: with TreeTagger or Mate) | |
Lemmatization (3rd party: with TreeTagger or Mate) | |
Stemming (Snowball) | |
Terminology extraction | |
Efficient multiword term detection | |
Term morphology extraction | |
Term syntactic variants detection | |
Term graphic variants detection | |
Variant detection based on term derivations and term prefixation | |
Term semantic variants detection | |
Term morphosyntactic variants detection | |
Term specificity (Weirdness Ratio) computing and other term measures: WR log, term frequency, etc | |
Term alignment (distributional and compositional, multilingual and monolingual) | |
Terminology export in multiple formats: `json`, `tsv`, `tbx` |