Prerequesites
- Java 8
- two terminologies extracted with term contexts
- a bilingual dictionary
AlignerCLI
Usage
java [-Xms256m -Xmx8g] -cp termsuite-core-3.0.2.jar \
fr.univnantes.termsuite.tools.AlignerCLI OPTIONS
Description
Translates domain-specific terms in multiligual comparable corpora from given language to given target language.
Mandatory options
--dictionary
FILE
The path to the bilingual dictionary to use for bilingual alignment
--source-termino
FILE
The source terminology (indexed corpus)
--target-termino
FILE
The source terminology (indexed corpus)
--term-list
, --term
Exactly one option in --term-list
, --term
must be set.
Other options
--distance
INT or FLOAT
Similarity measure used for context vector alignment. Allowed values are:
Cosine
,Jaccard
--explain
(no arg)
Shows for each aligned term the most influencial co-terms
--min-candidate-frequency
INT
The minimum frequency of target translation candidates
--n
, -n
INT
The number of translation candidates to show in the output
--term
TERM_LIST
The source term (lemma or grouping key) to translate
Warning: Exactly one option in --term-list
, --term
must be set.
--term-list
FILE
The path to a list of source terms (lemmas or grouping keys) to translate
Warning: Exactly one option in --term-list
, --term
must be set.
--tsv
FILE
A file path to write output of the bilingual aligner
Examples
Example launcher scripts can be found at:
https://github.com/termsuite/termsuite-core/tree/develop/examples/cmd
Translating french terms to english (and print top 5 candidates)
You can translate domain-specific terms, i.e. terms that cannot be found in
any general language bilingual dictionary, from one source language to another
target language, based on:
- a contextualized terminology extracted from a comparable multilingual corpus for the **source** language (`--source-termino`)
- a contextualized terminology extracted from **the same** comparable multilingual corpus for the **target** language (`--target-termino`)
- a bilingual `source language`-to-`target-language` dictionary (`--dictionary`)
- the source term to translate (`--source-term`) within double quotes `"` when multi-worded.
See how to produce an alignment-ready (i.e. contextualized) terminology with TermSuite
with TerminologyExtractorCLI.
Command Line
java -Xms1g -Xmx8g -cp $TS_HOME/termsuite-core-$TS_VERSION.jar \
fr.univnantes.termsuite.tools.AlignerCLI \
--source-termino $SOURCE_TERMINO_JSON \
--target-termino $TARGET_TERMINO_JSON \
--dictionary $BILINGUAL_DICO_PATH \
--term $SOURCE_TERM \
-n 5 \
--info
Docker
termsuite align \
--source-termino $SOURCE_TERMINO_JSON \
--target-termino $TARGET_TERMINO_JSON \
--dictionary $BILINGUAL_DICO_PATH \
--term $SOURCE_TERM \
-n 5 \
--info
Result
1 énergie éolienne wind energy 0,262 COMPOSITIONAL
2 énergie éolienne wind power 0,252 COMPOSITIONAL
3 énergie éolienne power of the wind 0,172 COMPOSITIONAL
4 énergie éolienne Windpower 0,164 COMPOSITIONAL
5 énergie éolienne Wind-Energy 0,150 COMPOSITIONAL
The output shows that the first translation candidate is wind energy
. The flag COMPOSITIONAL
indicates that it has been found by compositional method.