Validation Examples

Examples for validating CoNLL-U files for format and annotation guideline compliance.

Post-Conversion Validation

After converting from another format:

from conllu_tools.validation import ConlluValidator

# Focus on structure and content (level 3)
validator = ConlluValidator(lang='la', level=3)

errors = validator.validate_file('converted.conllu')

if errors.get_error_count() > 0:
    print(f"Found {errors.get_error_count()} validation errors")
    print('\n'.join(errors.format_errors()))

Batch Validation

Validate multiple files:

from pathlib import Path
from conllu_tools.validation import ConlluValidator

corpus_dir = Path('corpus/')
validator = ConlluValidator(lang='la', level=2)
all_valid = True

for file in corpus_dir.glob('*.conllu'):
    print(f"\nValidating {file.name}...")
    errors = validator.validate_file(str(file))

    if errors.get_error_count() > 0:
        all_valid = False
        print('\n'.join(errors.format_errors()))

if all_valid:
    print("\nAll files valid!")
else:
    print("\nSome files have errors")

Integration with Other Tools

Validate Before Evaluation

Validate before evaluating:

from conllu_tools.validation import ConlluValidator
from conllu_tools.evaluation import ConlluEvaluator

# Validate both files first
validator = ConlluValidator(lang='la', level=2)

for filename in ['gold.conllu', 'system.conllu']:
    errors = validator.validate_file(filename)
    if errors.get_error_count() > 0:
        print(f"{filename} has validation errors!")
        print('\n'.join(errors.format_errors()))
        exit(1)

# Then evaluate
evaluator = ConlluEvaluator()
scores = evaluator.evaluate_files('gold.conllu', 'system.conllu')

Validate After Conversion

Validate after conversion:

from conllu_tools.io import brat_to_conllu, load_language_data
from conllu_tools.validation import ConlluValidator

# Convert
feature_set = load_language_data('feats', language='la')
brat_to_conllu(
    input_directory='brat_files/',
    output_directory='output/',
    ref_conllu='reference.conllu',
    feature_set=feature_set
)

# Validate result
validator = ConlluValidator(lang='la', level=3)
errors = validator.validate_file('output/reference-from_brat.conllu')

if errors.get_error_count() > 0:
    print("Conversion produced invalid CoNLL-U!")
    print('\n'.join(errors.format_errors()))

See Also