Matching Module
The conllu_tools.matching module provides utilities for finding and matching complex
linguistic patterns in CoNLL-U data. It supports pattern-based searching across parsed
corpora using a flexible condition-based matching system.
Key Features:
Token-level pattern matching with conditions on any CoNLL-U field
Sentence-level pattern sequences for multi-token matching
Support for negation, substring matching, and value alternatives
Pattern building from simple string syntax
Quick Example
from conllu_tools.matching import build_pattern, find_in_corpus
# Build a pattern to find adjective + noun sequences
pattern = build_pattern("ADJ+NOUN", name="adj_noun")
# Find all matches in a corpus
matches = find_in_corpus(corpus, [pattern])
Utility Functions
build_pattern
find_in_corpus
- conllu_tools.matching.find_in_corpus(corpus, patterns)[source]
Find all matches of given patterns in the corpus.
- Parameters:
corpus (list[conllu.TokenList]) – The corpus to search.
patterns (list[SentencePattern]) – The patterns to match.
- Returns:
The list of all match results.
- Return type:
Pattern Classes
SentencePattern
- class conllu_tools.matching.SentencePattern(pattern, name=None)[source]
Bases:
objectRepresents a sequence of TokenPattern to match in a sentence as a whole.
- pattern
The list of TokenPatterns to match in sequence.
- Type:
TokenPattern
- class conllu_tools.matching.TokenPattern(conditions=None, negate=False, count=None, min_count=None, max_count=None)[source]
Bases:
objectRepresents a series of conditions to match in a token.
Condition
- class conllu_tools.matching.Condition(key=None, values=None, match_type='equals', match_any=False, negate=False)[source]
Bases:
objectRepresents a condition to be met by the properties of a token.
A Condition can also represent a container for other Conditions, allowing for nested logical structures.
- match_type
The type of match to perform (‘equals’, ‘contains’, ‘startswith’, ‘endswith’).
- Type: