For questions pertaining to SequenceMatcher from the python difflib module. This is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are hashable. difflib is part of the python standard library.
Questions tagged [sequencematcher]
72 questions
0
votes
1 answer
python3, difflib SequenceMatcher
the following takes in two strings, compares differences and return them both as identicals as well as their differences, separated by spaces (maintaining the length of the longest sting.
The commented area in the code, are the 4 strings that should…

Rhys
- 4,926
- 14
- 41
- 64
0
votes
1 answer
Finding closest approximate match between two sets of names
I have two sets of names of which I would like to find the closest match between the two, if no "close enough" match is found I would like to match the name to itself.
My current approach is to create a dataframe with all the possible combinations…

wingsoficarus116
- 429
- 5
- 17
0
votes
0 answers
Approximate name matching to merge two dataframes python
I am working with two dataframes (df1 and df2) of which I would like to merge df2 into df1 based on name matching, but between the two the names are not exactly matching (for example: 'JS Smith' may be "J.S. Smith (Jr)") and the names in df1 are in…

wingsoficarus116
- 429
- 5
- 17
0
votes
1 answer
How to get all matched parts to regex pattern
I have to parse a String in 3 stages. Only first stage works, in 2 and 3 stage matcher.groupCount() returns 0 - which means it found nothing. I was testing my regex in online tester and it was just fine. But here it doesn't work. So the question is…

sereGkaluv
- 31
- 6
0
votes
1 answer
Programmatically figuring out if translated names are equivalent
I'm trying to see if two translated names are equivalent. Sometimes the translation will have the names ordered differently. For example:
>>> import difflib
>>> a = 'Yuk-shing Au'
>>> b = 'Au Yuk Sing'
>>> seq=difflib.SequenceMatcher(a=a.lower(),…

David542
- 104,438
- 178
- 489
- 842
-1
votes
1 answer
Compare two dataframe columns with binary data
I have two columns with binary data (1s and 0s) And I want to check what's the percent similiarity between one column and the other. Obviously, as they are binary, it is important that the coincidence is based in the position of each cell, not in…

Aurepilous
- 7
- 2
-1
votes
1 answer
How to merge/ add columns to dataframes in pandas when the joining column has slight spelling differences?
So I have a data frame like this
Rank State/Union territory NSDP Per Capita (Nominal)(2019–20)[1][2] state_id
0 1 Goa 466585.0 30.0
1 2 Sikkim …

nasc
- 289
- 3
- 16
-1
votes
2 answers
How to match a text contained in one variable to another
So, lets say I have this line of code
x = 'My name is James Bond'
y = 'My name is James Bond and I am an MI-6 agent stationed in London, UK'
from difflib import SequenceMatcher as sm
sm(None, x, y)
Now, the ratio being returned is…

Pankaj Singh
- 526
- 7
- 21
-1
votes
1 answer
difflib.SequenceMatcher not returning unique ratio
I am trying to compare 2 street networks and when i run this code it returns a a ratio of .253529... i need it to compare each row to get a unique value so i can query out the streets that dont match. What can i do it get it to return unique ratio…

Stephen Holt
- 3
- 1
-2
votes
1 answer
Best method in Python to find a longest string subset against a list of multiple options per character
I have a simple string, and a list of sets, where each set is a position with 2 possible characters, which looks something like:
"AGTCG"
[('A', 'T'), ('C', 'B'), ('G', 'T'), ('T', 'X'), ... ]
Where I want to find the longest match. In this example…

kedingt
- 13
- 5
-2
votes
1 answer
if and statement between to pandas dataframes
I have 2 datasets, using data from df1 I want to identify duplicate data in df2 using 4 conditions.
Conditions:
If a row of df1 'Name' column matches more than 80% with any row of 'Name' column in df2
(AND)
(df1['Class'] == df2['Class'] (OR)…

Vin Bolisetti
- 45
- 6
-4
votes
1 answer
Why python thread and process not working?
I have a big jsn list which contains a lot of string elements with possible duplicate values.
I need to check each element for similarity and add duplicate list item keys in dubs list to remove these items from jsn list.
Because of size of jsn list…

redevil
- 155
- 7