Evaluation Module

The conllu_tools.evaluation module provides tools for evaluating CoNLL-U format annotations, including computing precision, recall, and F1 scores for various annotation layers.

The evaluation framework is based on the official CoNLL shared task evaluation scripts, with support for all standard UD evaluation metrics.

Main Classes

Evaluator

class conllu_tools.evaluation.evaluator.ConlluEvaluator(*, eval_deprels=True, treebank_type='0')[source]

Bases: WordProcessingMixin, TreeValidationMixin

Evaluator for Universal Dependencies CoNLL-U files.

__init__(*, eval_deprels=True, treebank_type='0')[source]

Initialize the evaluator.

Parameters:
  • eval_deprels (bool) – Whether to evaluate dependency relations

  • treebank_type (str) – String indicating which enhancement types to disable (e.g., ‘12’ disables 1 and 2)

evaluate_files(gold_path, system_path)[source]

Evaluate system file against gold file.

Parameters:
  • gold_path (str | Path) – Path to gold standard file

  • system_path (str | Path) – Path to system output file

Return type:

dict[str, Score]

Returns:

Dictionary of metric names to Score objects

Score

class conllu_tools.evaluation.base.Score(gold_total, system_total, correct, aligned_total=None)[source]

Bases: object

Represents evaluation scores for a particular metric.

gold_total: int | None
system_total: int | None
correct: int | None
aligned_total: int | None = None
property precision: float

Calculate precision.

property recall: float

Calculate recall.

property f1: float

Calculate F1 score.

property aligned_accuracy: float | None

Calculate aligned accuracy.

__init__(gold_total, system_total, correct, aligned_total=None)

Supporting Classes

These classes are used internally by the evaluator but may be useful for advanced use cases.

UDWord

class conllu_tools.evaluation.base.UDWord(span, token, is_multiword, enhanced_deps=None, functional_children=None)[source]

Bases: object

Represents a word with its span and CoNLL-U token.

span: UDSpan
token: Token
is_multiword: bool
enhanced_deps: list[tuple[int | UDWord, list[str]]] | None = None
functional_children: list[UDWord] | None = None
__hash__()[source]

Make UDWord hashable for use in dictionaries.

Return type:

int

__init__(span, token, is_multiword, enhanced_deps=None, functional_children=None)

UDSpan

class conllu_tools.evaluation.base.UDSpan(start, end)[source]

Bases: object

Represents a span (start and end position) in the character array.

start: int
end: int
__init__(start, end)

Alignment

class conllu_tools.evaluation.base.Alignment(gold_words, system_words)[source]

Bases: object

Represents the alignment between gold and system words.

__init__(gold_words, system_words)[source]

Initialize alignment.

Parameters:
  • gold_words (list[UDWord]) – List of gold words

  • system_words (list[UDWord]) – List of system words

append_aligned_words(gold_word, system_word)[source]

Add an aligned word pair.

Parameters:
  • gold_word (UDWord) – Gold word

  • system_word (UDWord) – System word

Return type:

None

AlignmentWord

class conllu_tools.evaluation.base.AlignmentWord(gold_word, system_word)[source]

Bases: object

Represents an aligned pair of gold and system words.

gold_word: UDWord
system_word: UDWord
__init__(gold_word, system_word)

Exceptions

exception conllu_tools.evaluation.base.UDError[source]

Bases: Exception

Raised when there is an error in the UD data or evaluation process.