May 24, 2013

Levenshtein distance (a string metric for measuring the difference between two sequences)

FuzzyWuzzy: Fuzzy String Matching in Python
# SIMPLE RATIO
fuzz.ratio("this is a test", "this is a test!")
# 96

# PARTIAL RATIO
fuzz.partial_ratio("this is a test", "this is a test!")
# 100

# TOKEN SORT RATIO
fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
# 90

fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
# 100

# TOKEN SET RATIO
fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
# 84
fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
# 100

# PROCESS
choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
process.extract("new york jets", choices, limit=2)
# [('New York Jets', 100), ('New York Giants', 78)]
process.extractOne("cowboys", choices)
# ("Dallas Cowboys", 90)

No comments: