Transitive Cost Closure
By default, WeightedLevenshtein only considers the
costs you explicitly provide. If (“6”, “G”) costs 0.5 and deleting “G”
costs 0.01, the engine does not automatically know that deleting “6”
in context costs 0.51.
transitive_closure() returns a new
instance whose cost dictionaries are filled with these effective (transitive)
costs. The closure also materializes mixed chains, e.g. a del(“y”) + ins(“x”)
sequence becoming an effective (“y”, “x”) substitution.
Example
from ocr_stringdist import WeightedLevenshtein
wl = WeightedLevenshtein(
substitution_costs={("6", "G"): 0.5},
deletion_costs={"G": 0.01},
).transitive_closure()
# The chain "6" -> "G" -> ε is now a single effective deletion at 0.51.
print(wl.distance("06", "0")) # 0.51
After closure, explain() returns a
single flat operation at the effective cost; the underlying chain is not
preserved.
You may pass prune=True to the transitive_closure method to remove
generated substitutions whose costs are already represented by matches,
insertions, deletions, or shorter substitutions. This shrinks the resulting
cost map but is significantly more expensive to compute.