IO Module

The conllu_tools.io module provides utilities for converting between CoNLL-U and brat standoff formats, as well as functions for loading language-specific data and configurations.

Conversion Functions

These functions convert between CoNLL-U and brat annotation formats, enabling round-trip annotation workflows.

conllu_to_brat

conllu_tools.io.conllu_to_brat(conllu_filename, output_directory, sents_per_doc=None, output_root=True)[source]

Convert a CONLLU formatted file to Brat’s standoff format.

Parameters:
  • conllu_filename (str) – Path to the input CONLLU file.

  • output_directory (str) – Directory to write the output BRAT files.

  • sents_per_doc (int | None) – Maximum number of sentences per output document. If None, all sentences are written to a single document.

  • output_root (bool) – Whether to include an explicit ROOT node in the output.

Return type:

None

brat_to_conllu

conllu_tools.io.brat_to_conllu(input_directory, output_directory, feature_set, ref_conllu=None, sents_per_doc=None, output_root=None)[source]

Convert Brat annotations back to CoNLL-U format.

Return type:

None

Data Loading Functions

These functions load language-specific data files for validation and normalization.

load_language_data

conllu_tools.io.load_language_data(_type, language, additional_path=None, load_dalme=False)[source]

Load language data.

Parameters:
  • _type (str) – Type of data to load (‘features’, ‘auxiliaries’, ‘dependencies’).

  • language (str | None) – A language code (e.g., ‘la’ for Latin), to filter for specific subsets.

  • additional_path (str | Path | None) – Path to a JSON file containing additional data.

  • load_dalme (bool) – Whether to load DALME-specific data.

Return type:

dict[str, Any]

Returns:

A dictionary containing the loaded data.

load_whitespace_exceptions

conllu_tools.io.load_whitespace_exceptions(additional_exceptions_path=None)[source]

Load whitespace exceptions.

The format consists of regular expressions (one per line) that match tokens allowed to contain whitespace. These are compiled and stored for validation.

Parameters:

additional_exceptions_path (str | Path | None) – Optional path to a file containing additional whitespace exceptions.

Return type:

list[Pattern]

Returns:

A list of compiled regex patterns representing whitespace exceptions.