Utils Module
The conllu_tools.utils module provides utilities for working with different tagsets
and formats, including morphology normalization, XPOS format conversion, and feature
validation.
Key Capabilities:
Normalize morphological annotations across different treebank formats
Convert between UPOS tags and Perseus XPOS codes
Validate and convert features and XPOS strings
Convert XPOS formats from LLCT, ITTB, and PROIEL treebanks to Perseus standard
Morphology Normalization
The main entry point for normalizing morphological information.
- conllu_tools.utils.normalization.normalize_morphology(upos, xpos, feats, feature_set, ref_features=None)[source]
Normalize morphological information.
Takes UPOS, XPOS, and FEATS, normalizes and validates them against a provided feature set, and reconciles with reference features if provided.
- Parameters:
upos (
str) – The Universal Part of Speech tag.xpos (
str) – The language-specific Part of Speech tag.feats (
dict[str,str] |str) – A string or dictionary of features.feature_set (
dict[str,Any]) – A feature set dictionary defining valid features.ref_features (
dict[str,str] |str|None) – A reference feature string or dictionary to reconcile with (optional).
- Return type:
- Returns:
A tuple containing the normalized XPOS string and validated feature dictionary.
UPOS Utilities
Convert between different POS tag systems.
Convert from and to UPOS.
Feature Utilities
Convert and validate morphological features.
Feature string and dictionary conversion utilities.
- conllu_tools.utils.features.feature_string_to_dict(feat_string)[source]
Convert a feature string to a dictionary.
- conllu_tools.utils.features.feature_dict_to_string(feat_dict)[source]
Convert a feature dictionary to a string.
- conllu_tools.utils.features.features_to_xpos(feats)[source]
Convert features to XPOS in Perseus format.
- conllu_tools.utils.features.xpos_to_features(xpos)[source]
Convert XPOS in Perseus format to features.
XPOS Utilities
Convert and validate XPOS tags across different treebank formats.
Format XPOS
Auto-detect and convert XPOS formats to Perseus standard.
Convert various XPOS formats to Perseus XPOS format.
Validate XPOS
Validate XPOS positions against UPOS-specific rules.
XPOS validation.
ITTB to Perseus
Convert Index Thomisticus Treebank XPOS to Perseus format.
Functions for converting between ITTB and Perseus XPOS tags.
PROIEL to Perseus
Convert PROIEL Treebank XPOS to Perseus format.
Functions for converting between PROIEL and Perseus XPOS tags.
LLCT to Perseus
Convert Late Latin Charter Treebank XPOS to Perseus format.
Functions for converting between LLCT and Perseus XPOS tags.
brat Utilities
Utilities for working with the brat standoff annotation format. These are used by the conversion tools in the IO module but can also be used independently.
Utilities for BRAT standoff format.
- conllu_tools.utils.brat.type_to_safe_type(typestring)[source]
Rewrite characters in CoNLL-X types that cannot be directly used in identifiers in brat-flavored standoff.
- conllu_tools.utils.brat.safe_type_to_type(typestring)[source]
Rewrite characters in brat-flavored standoff types back to CoNLL-X format.
- conllu_tools.utils.brat.parse_annotation_line(line)[source]
Parse a BRAT annotation line into its components.
- conllu_tools.utils.brat.format_annotation(ann)[source]
Format an annotation dict back into BRAT format.
- conllu_tools.utils.brat.read_annotations(filepath)[source]
Read and parse all annotations from a BRAT .ann file.
- conllu_tools.utils.brat.read_text_lines(filepath)[source]
Read the text content from a BRAT .txt file.
- conllu_tools.utils.brat.sort_annotations_set(annotations)[source]
Sort set of annotations by ID number to maintain consistent ordering.
- conllu_tools.utils.brat.sort_annotations(annotations)[source]
Sort annotations by type and ID number.
- conllu_tools.utils.brat.write_annotations(filepath, annotations)[source]
Write annotations to a BRAT .ann file.
- conllu_tools.utils.brat.write_text(filepath, doctext)[source]
Write document text to a BRAT .txt file.
- conllu_tools.utils.brat.write_auxiliary_files(output_directory, metadata)[source]
Add metadata and default BRAT configuration files to the output directory.