TSV (Tab-Separated Values) is a human-readable format that is easy to reuse (Calc, Excel, R, or any program supporting CSV import) for terminology exportation. The terminology TSV output file produced by TermSuite contains all relevant information contained in a terminology:
- Exporting a terminology to TSV
- Understanding TSV output
- Configuring TSV output
Exporting a terminology to TSV
To produce a
tsv file out of a terminology with TermSuite, use:
--tsvif you are operating with the command line,
- TsvExporter if you are operating with the Java API,
- Export to TSV… from the menu of the terminology editor if you are operating with the graphical user interface of TermSuite,
Understanding TSV output
Here is an excerpt of the TSV output.
# type pattern pilot spec freq dFreq 1 T N rotor 4,82 848 30 2 T N N wind turbine 4,56 1855 37 2 V[s] N N N wind turbine rotor 3,38 31 12 2 V[s] A N N offshore wind turbine 3,26 47 7 2 V[s] N N N wind turbine noise 3,53 43 3 2 V[s]+ N N N wind turbine technology 3,34 28 10 2 V[s] N N N wind turbine system 3,40 32 7 2 V[s] A N N modern wind turbines 2,82 17 7 2 V[s] N N N wind turbine tower 3,07 15 9 2 V[s] A N N large wind turbines 3,12 17 10 2 V[s] N N N wind turbine power 2,89 10 6 3 T N N wind energy 4,51 414 32 3 V[s] N N N wind energy potential 3,07 15 5 3 V[s] A N N offshore wind energy 3,56 47 7 3 V[s] N N N wind energy development 3,29 25 5 4 T N N wind power 4,34 278 26 4 V[s] N N N wind turbine power 2,89 10 6 4 V[s] N N N Wind Power Plant 3,76 74 9 4 V[s] A N N offshore wind power 3,01 13 4 5 T N airfoil 4,26 236 8
Row type (T or V)
Each row having
T as type (second column) is a term. The rank of the term is denoted as
# (first column).
Each row having somthing like
V[*] as type is a variant. To find wich is the base term of a variant, you need to look at the
# (the rank, i.e. first column) of a variant. For example :
2 V[s] A N N offshore wind turbine 3,26 47 7
The line above indicates that
offshore wind turbine is a variant of the term ranked
wind turbine. Variants are usually listed directly under the term.
As a consequence, one single term like
wind turbine power can appear several times as a variant (type
V[*]) of other terms, here
wind turbine (ranked
wind power (ranked
4). On the contrary, one single term can appear only at most once as a term (type
T) in the TSV. For example,
wind turbine power appears as a term at line 122 of the same file:
122 T A N N offshore wind turbine 3,26 47 7
Type of variants V[*]
A value of
V for poperty
type indicates a variants. This values can take one or more letters with brackets:
- s: indicates that the variation is syntagmatical (the most basic variation type)
- m: indicates that the variation is morphological,
- g: indicates that the variation is graphical,
- h: indicates that the variation is semantic, see values of properties
isDistribto know wether the variant has been found from dictionary or by context vector comparison (or both),
- d: indicates that the variation is a derivate,
- p: indicates that the variation is a prefix,
- i: indicates that the variation has been infered from another pair of saller-size terms. The i flag should always appear together with another flag.
When the flad ends with +, like in
V[s]+, it indicates that the variant also has its own variants. In other words, it means that is variation can expanded, and that the base term has order-2 variants.
375 T N windpower 2,82 17 8 375 V[mg]+ N N wind power 4,34 278 26 375 V[m]+ N P N power of the wind 2,97 12 7 375 V[m] N N N wind turbine power 2,89 10 6
V[mg]+ wind power indicates that
wind power is a both a morphological and graphical variant of term
windpower, and that
wind power also has its own variants. (Indeed, see at rank
V[m]+ power of the wind indicates that
power of the wind is a morphological variant of term
windpower, and that
power of the wind also has its own variants.
25 T turbine sound 80 3,80 25 V[s]+ wind turbine sound 71 3,74 25 V[h]+ turbine noise 52 3,61 0,65 0 1
V[h]+ turbine noise indicates that
turbine noise is a semantic variant of term
turbine sound, and that
turbine noise also has its own variants.
Configuring TSV output
TermSuite APIs allow to customize the TSV output file. Refer to the documentation of the API you are using.
Wether TermSuite should write column names at the first line of TSV file.
Wether TermSuite should filter variant rows from TSV file, i.e. lines whose type is in the form
List of properties to show
The list of term or variant properties to show as columns of TSV file. Refer to available properties for an exhaustive list of values allowed.
When the line to display in TSV is a variation, it is now possible to specify a term property prefixed with
source:pilot… In that case, the value displayed for that column is the value of the property for the source (base) term of the variation. This feature may be useful when it comes to keeping only variations lines (for example by filtering within Excel) and still having the base term’s properties on the same line.