Fork TermSuite on GitHub.

Prerequesites

  1. Java 8
  2. two terminologies extracted with term contexts
  3. a bilingual dictionary

AlignerCLI

Usage

java [-Xms256m -Xmx8g] -cp termsuite-core-3.0.2.jar \
	 fr.univnantes.termsuite.tools.AlignerCLI OPTIONS

Description

Translates domain-specific terms in multiligual comparable corpora from given language to given target language.

Mandatory options

--dictionary FILE

The path to the bilingual dictionary to use for bilingual alignment

--source-termino FILE

The source terminology (indexed corpus)

--target-termino FILE

The source terminology (indexed corpus)

--term-list, --term

Other options

--distance INT or FLOAT

Similarity measure used for context vector alignment. Allowed values are: Cosine, Jaccard

--explain (no arg)

Shows for each aligned term the most influencial co-terms

--min-candidate-frequency INT

The minimum frequency of target translation candidates

--n, -n INT

The number of translation candidates to show in the output

--term TERM_LIST

The source term (lemma or grouping key) to translate

--term-list FILE

The path to a list of source terms (lemmas or grouping keys) to translate

--tsv FILE

A file path to write output of the bilingual aligner

Examples

Example launcher scripts can be found at:

https://github.com/termsuite/termsuite-core/tree/develop/examples/cmd

Translating french terms to english (and print top 5 candidates)

You can translate domain-specific terms, i.e. terms that cannot be found in
any general language bilingual dictionary, from one source language to another
target language, based on:

  • a contextualized terminology extracted from a comparable multilingual corpus for the **source** language (`--source-termino`)
  • a contextualized terminology extracted from **the same** comparable multilingual corpus for the **target** language (`--target-termino`)
  • a bilingual `source language`-to-`target-language` dictionary (`--dictionary`)
  • the source term to translate (`--source-term`) within double quotes `"` when multi-worded.



See how to produce an alignment-ready (i.e. contextualized) terminology with TermSuite
with TerminologyExtractorCLI.

Command Line

java -Xms1g -Xmx8g -cp $TS_HOME/termsuite-core-$TS_VERSION.jar \
      fr.univnantes.termsuite.tools.AlignerCLI \
      --source-termino $SOURCE_TERMINO_JSON \
      --target-termino $TARGET_TERMINO_JSON \
      --dictionary $BILINGUAL_DICO_PATH \
       --term $SOURCE_TERM \
       -n 5 \
       --info

Docker

termsuite align \
      --source-termino $SOURCE_TERMINO_JSON \
      --target-termino $TARGET_TERMINO_JSON \
      --dictionary $BILINGUAL_DICO_PATH \
       --term $SOURCE_TERM \
       -n 5 \
       --info
Result
1	énergie éolienne	wind energy	0,262	COMPOSITIONAL
2	énergie éolienne	wind power	0,252	COMPOSITIONAL
3	énergie éolienne	power of the wind	0,172	COMPOSITIONAL
4	énergie éolienne	Windpower	0,164	COMPOSITIONAL
5	énergie éolienne	Wind-Energy	0,150	COMPOSITIONAL

The output shows that the first translation candidate is wind energy. The flag COMPOSITIONAL indicates that it has been found by compositional method.