I would like to know how to enable the fuzzy evaluation/calculation. I found that scikit-fuzzy might be useful. But I can't find the consistent fuzzy matrix function. I assume that there will be some data platform or python code that can implement this automatically. Can anybody help me?
Asked
Active
Viewed 58 times
1 Answers
1
The code that I use is apart of the RapidFuzz package which also computes string similarity. Heres a link that might be helpful:
https://maxbachmann.github.io/RapidFuzz/Usage/process.html
The code that I use to generate a matrix is this when I am comparing one column of strings to itself:
strings1= df['usernames']
C = process.cdist(strings1, strings1, scorer=fuzz.ratio, workers = -1)
Output:
array([[100. , 22.222221, 19.047619, ..., 21.052631, 26.666666,
11.764706],
[ 22.222221, 100. , 21.052631, ..., 23.529411, 15.384615,
13.333333],
[ 19.047619, 21.052631, 100. , ..., 30. , 12.5 ,
22.222221],
...,
[ 21.052631, 23.529411, 30. , ..., 100. , 14.285714,
25. ],
[ 26.666666, 15.384615, 12.5 , ..., 14.285714, 100. ,
33.333332],
[ 11.764706, 13.333333, 22.222221, ..., 25. , 33.333332,
100. ]], dtype=float32)
This also is a lot faster than using Fuzzy Wuzzy since RapidFuzz was developed in C. Hope this helps

user
- 36
- 4
-
Thanks a lot, how do I create consistent fuzzy matrix with this package? – James Aug 05 '22 at 15:04
-
Venmo me and I will show you – user Aug 05 '22 at 18:07
-
yes sure, how can I venmo you – James Aug 06 '22 at 13:15