Questions tagged [sequencematcher]

For questions pertaining to SequenceMatcher from the python difflib module. This is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are hashable. difflib is part of the python standard library.

Documentation

72 questions
0
votes
1 answer

python3, difflib SequenceMatcher

the following takes in two strings, compares differences and return them both as identicals as well as their differences, separated by spaces (maintaining the length of the longest sting. The commented area in the code, are the 4 strings that should…
Rhys
  • 4,926
  • 14
  • 41
  • 64
0
votes
1 answer

Finding closest approximate match between two sets of names

I have two sets of names of which I would like to find the closest match between the two, if no "close enough" match is found I would like to match the name to itself. My current approach is to create a dataframe with all the possible combinations…
wingsoficarus116
  • 429
  • 5
  • 17
0
votes
0 answers

Approximate name matching to merge two dataframes python

I am working with two dataframes (df1 and df2) of which I would like to merge df2 into df1 based on name matching, but between the two the names are not exactly matching (for example: 'JS Smith' may be "J.S. Smith (Jr)") and the names in df1 are in…
wingsoficarus116
  • 429
  • 5
  • 17
0
votes
1 answer

How to get all matched parts to regex pattern

I have to parse a String in 3 stages. Only first stage works, in 2 and 3 stage matcher.groupCount() returns 0 - which means it found nothing. I was testing my regex in online tester and it was just fine. But here it doesn't work. So the question is…
sereGkaluv
  • 31
  • 6
0
votes
1 answer

Programmatically figuring out if translated names are equivalent

I'm trying to see if two translated names are equivalent. Sometimes the translation will have the names ordered differently. For example: >>> import difflib >>> a = 'Yuk-shing Au' >>> b = 'Au Yuk Sing' >>> seq=difflib.SequenceMatcher(a=a.lower(),…
David542
  • 104,438
  • 178
  • 489
  • 842
-1
votes
1 answer

Compare two dataframe columns with binary data

I have two columns with binary data (1s and 0s) And I want to check what's the percent similiarity between one column and the other. Obviously, as they are binary, it is important that the coincidence is based in the position of each cell, not in…
-1
votes
1 answer

How to merge/ add columns to dataframes in pandas when the joining column has slight spelling differences?

So I have a data frame like this Rank State/Union territory NSDP Per Capita (Nominal)(2019–20)[1][2] state_id 0 1 Goa 466585.0 30.0 1 2 Sikkim …
nasc
  • 289
  • 3
  • 16
-1
votes
2 answers

How to match a text contained in one variable to another

So, lets say I have this line of code x = 'My name is James Bond' y = 'My name is James Bond and I am an MI-6 agent stationed in London, UK' from difflib import SequenceMatcher as sm sm(None, x, y) Now, the ratio being returned is…
Pankaj Singh
  • 526
  • 7
  • 21
-1
votes
1 answer

difflib.SequenceMatcher not returning unique ratio

I am trying to compare 2 street networks and when i run this code it returns a a ratio of .253529... i need it to compare each row to get a unique value so i can query out the streets that dont match. What can i do it get it to return unique ratio…
-2
votes
1 answer

Best method in Python to find a longest string subset against a list of multiple options per character

I have a simple string, and a list of sets, where each set is a position with 2 possible characters, which looks something like: "AGTCG" [('A', 'T'), ('C', 'B'), ('G', 'T'), ('T', 'X'), ... ] Where I want to find the longest match. In this example…
kedingt
  • 13
  • 5
-2
votes
1 answer

if and statement between to pandas dataframes

I have 2 datasets, using data from df1 I want to identify duplicate data in df2 using 4 conditions. Conditions: If a row of df1 'Name' column matches more than 80% with any row of 'Name' column in df2 (AND) (df1['Class'] == df2['Class'] (OR)…
-4
votes
1 answer

Why python thread and process not working?

I have a big jsn list which contains a lot of string elements with possible duplicate values. I need to check each element for similarity and add duplicate list item keys in dubs list to remove these items from jsn list. Because of size of jsn list…
1 2 3 4
5