Evaluation Module

The conllu_tools.evaluation module provides tools for evaluating CoNLL-U format annotations, including computing precision, recall, and F1 scores for various annotation layers.

The evaluation framework is based on the official CoNLL shared task evaluation scripts, with support for all standard UD evaluation metrics.

Main Classes

Evaluator

class conllu_tools.evaluation.evaluator.ConlluEvaluator(*, eval_deprels=True, treebank_type='0')[source]

Bases: WordProcessingMixin, TreeValidationMixin

Evaluator for Universal Dependencies CoNLL-U files.

__init__(*, eval_deprels=True, treebank_type='0')[source]

Initialize the evaluator.

Parameters:

eval_deprels (bool) – Whether to evaluate dependency relations
treebank_type (str) – String indicating which enhancement types to disable (e.g., ‘12’ disables 1 and 2)

evaluate_files(gold_path, system_path)[source]

Evaluate system file against gold file.

Parameters:

gold_path (str | Path) – Path to gold standard file
system_path (str | Path) – Path to system output file

Return type:

dict[str, Score]

Returns:

Dictionary of metric names to Score objects

Score

class conllu_tools.evaluation.base.Score(gold_total, system_total, correct, aligned_total=None)[source]

Bases: object

Represents evaluation scores for a particular metric.

gold_total: int | None

system_total: int | None

correct: int | None

aligned_total: int | None = None

property precision: float: Calculate precision.

property recall: float: Calculate recall.

property f1: float: Calculate F1 score.

property aligned_accuracy: float | None: Calculate aligned accuracy.

__init__(gold_total, system_total, correct, aligned_total=None)

Supporting Classes

These classes are used internally by the evaluator but may be useful for advanced use cases.

UDWord

class conllu_tools.evaluation.base.UDWord(span, token, is_multiword, enhanced_deps=None, functional_children=None)[source]

Bases: object

Represents a word with its span and CoNLL-U token.

span: UDSpan

token: Token

is_multiword: bool

enhanced_deps: list[tuple[int | UDWord, list[str]]] | None = None

functional_children: list[UDWord] | None = None

__hash__()[source]

Make UDWord hashable for use in dictionaries.

Return type:: int

__init__(span, token, is_multiword, enhanced_deps=None, functional_children=None)

UDSpan

class conllu_tools.evaluation.base.UDSpan(start, end)[source]

Bases: object

Represents a span (start and end position) in the character array.

start: int

end: int

__init__(start, end)

Alignment

class conllu_tools.evaluation.base.Alignment(gold_words, system_words)[source]

Bases: object

Represents the alignment between gold and system words.

__init__(gold_words, system_words)[source]

Initialize alignment.

Parameters:

gold_words (list[UDWord]) – List of gold words
system_words (list[UDWord]) – List of system words

append_aligned_words(gold_word, system_word)[source]

Add an aligned word pair.

Parameters:

gold_word (UDWord) – Gold word
system_word (UDWord) – System word

Return type:

AlignmentWord

class conllu_tools.evaluation.base.AlignmentWord(gold_word, system_word)[source]

Bases: object

Represents an aligned pair of gold and system words.

gold_word: UDWord

system_word: UDWord

__init__(gold_word, system_word)

Exceptions

exception conllu_tools.evaluation.base.UDError[source]

Bases: Exception

Raised when there is an error in the UD data or evaluation process.