TermSuite

You need to install a 3rd-party Part-Of-Speech tagger and lemmatizer when you run TermSuite for extracting the terminology of a corpus.

There are currently two POS Taggers and lemmatizers supported by TermSuite:
* TreeTagger (recommended): supports fr, en, es, de, ru, da
* Mate: supports fr, en (better POS tagging performances but slower)

Option 1 (recommended): Installing TreeTagger and its models
Option 2: Installing Mate models

Option 1 (recommended): Installing TreeTagger and its models

TreeTagger is a very fast POS tagger and lemmatizer having very acceptable performances on all TermSuite languages. Unfortunately, its license excludes commercial usage. As a consequence, TreeTagger cannot be included as a 3rd party dependency in TermSuite and needs to be install manually by end users.

Download and install TreeTagger to your OS following the official instructions.

The TreeTagger home directory contains tree subdirectories: bin, models, and lib. The TreeTagger executable program should be in the bin sub-directory. Note that the TreeTagger executable program is called tree-tagger on Linux and tree-tagger.exe on Windows.

The language model parameter files for should be in the models sub-directory. If the parameter files are in the lib subdirectory, please create a symbolic link named models to this directory as follows:

$ cd /path/to/tree-tagger-home-directory
$ ln -s lib models

Pay attention to parameter file names and encoding !

In the models directory, TermSuite expects a specific file name for each language parameter file. Please name your parameter files as follows, and make them all be utf-8 encoded:

Language	Parameter File	Encoding
English	english.par	utf-8
French	french.par	utf-8
Italian	italian.par	utf-8
Spanish	spanish.par	utf-8
German	german.par	utf-8
Russian	russian.par	utf-8
Danish	danish.par	utf-8

On Linux, you can check that TreeTagger is correctly installed by launching these two command lines:

$ cd tree-tagger-home-directory
$ ./bin/tree-tagger ./models/english.par

Exit the program by the keyboard short-cut Ctrl+D.

Option 2: Installing Mate models

Mate has slightly better POS tagging performances than TreeTagger in the context of terminology extraction but also has a few disadvantages:
* only three languages models are public en, de, and fr,
* parsing language models is very slow and results in a quite long and constant loading time for each run,
* tagging and lemmatizing raw text is also very slow compared to TreeTagger.

The main advantage of Mate is that it is embedded into TermSuite. You do not need to install it on your OS. You just need to download (or train) the required language models. See the official page for models of these three languages: en, de, and fr.

Mate requires two different for each languages. The parser+tagger model and the lemmatizer model.

Pay attention to parameter file names !

TermSuite expects a specific file name for each mate model. Please follow naming patterns below.

The models must follow this patterns : (where xx is the two-letter language code)

Model	File name pattern
parser+tagger	`mate-pos-xx.model`
lemmatizer	`mate-lemma-xx.model`

Documentation Installing the POS tagger and lemmatizer

Introduction

Command Line API

Java API

Graphical User Interface

Theory

Input/Output Formats

Linguistic Resources

Links

Option 1 (recommended): Installing TreeTagger and its models

Option 2: Installing Mate models