- Prerequesites
- Extracting terminology
- Configuring terminology extraction
- Debugging terminology extraction
- Getting statistics of pipeline executions
Prerequesites
- Java 8
Extracting terminology
Say corpus
is the IndexedCorpus object produced by the NLP preprocessings (see pipelines for explanations), the terminology extraction pipeline can launched with Java API with:
TermSuite.terminoExtractor().execute(corpus);
The TermSuite.terminoExtractor()
creates a new TerminoExtractor object, which is a builder for the configuration of the terminology extraction pipeline. the method execute()
runs this pipeline on the parameter corpus. After the pipeline has executed, the parameter corpus
contains many term variations and other morphological informations and cleanings.
Configuring terminology extraction
The terminology extraction process can be finely tuned by passing an ExtractorOptions options object to TerminoExtractor.
You can create a language-independant ExtractorOptions object from scratch, but it is not advised:
// !!! Not advised !!!
ExtractorOptions extractorOptions = new ExtractorOptions();
But, it is better to creates a new ExtractorOptions object by cloning the language defaults:
// Clones the default configuration object for given language
ExtractorOptions extractorOptions = TermSuite.getDefaultExtractorConfig(lang);
Full example:
// Clones the default configuration object for given language
ExtractorOptions extractorOptions = TermSuite.getDefaultExtractorConfig(lang);
// deactivates post-processing
extractorOptions.getPostProcessorConfig().setEnabled(false);
// activates semantic variant detection
extractorOptions.getGathererConfig().setSemanticEnabled(true);
// runs the pipeline
TermSuite.terminoExtractor()
.setOptions(extractorOptions)
.execute(corpus);
Debugging terminology extraction
If you need to investigate why a term has came out of the terminology extraction process or why another term has not came out of the terminology extraction process, you can set a history to TerminoExtractor. The following example watch the term “wind energy”
TermHistory history = new TermHistory();
// runs the pipeline
TermSuite.terminoExtractor()
.setHistory(history)
.watch("wind energy") // watch the term history
.execute(corpus);
// Prints all events impacting term "wind energy"
System.out.println(history.toString("wind energy"));
Getting statistics of pipeline executions
The execute()
method always returns some stats about pipeline execution as an instance of type PipelineStats:
// runs the pipeline
PipelineStats stats = TermSuite.terminoExtractor()
.execute(corpus);