For questions pertaining to SequenceMatcher from the python difflib module. This is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are hashable. difflib is part of the python standard library.
Questions tagged [sequencematcher]
72 questions
0
votes
1 answer
Comparing strings within two columns in pandas with SequenceMatcher
I am trying to determine the similarity of two columns in a pandas dataframe:
Text1 All
Performance results achieved by the approaches submitted to this Challenge. The…
user12809368
0
votes
0 answers
Tagging sentences with custom defined dictionary
I am trying to tag sentences using a custom defined dictionary.
For example if I have two text files (1. My sentences, 2. My dictionary)
Sentences file:
I have abdominal pain and have difficulty breathing
Dictionary file:
Abdominal pain, difficulty…

Haris Jawed
- 15
- 4
0
votes
1 answer
Find Similar Elements in List using Python
I need to look for similar Items in a list using python. (e.g. 'Limits' is similar to 'Limit' or 'Download ICD file' is similar to 'Download ICD zip file')
I really want my results to be similar with chars, not with digits (e.g. 'Angle 1' is similar…

doublesobig
- 1
- 3
0
votes
1 answer
Python How to loop sequence match Dataframes through specific columns and extra the rows
I have been trying the last 2 weeks to solve this problem, and i am almost at the goal.
Case:
Overall depiction of what i am trying
I have 2 dataframes extracted from 2 different excel sheets for this example let us say 3x3 (DF1 and DF2)
I want to…

Shawn Atlas
- 25
- 1
- 7
0
votes
2 answers
How to iterate through 2 columns and match one by one
Lets say i have 2 excel files each containing a column of names and dates
Excel 1:
Name
0 Bla bla bla June 04 2018
1 Puppy Dog June 01 2017
2 Donald Duck February 24 2017
3 Bruno Venus April 24 2019
Excel 2:
…

Shawn Atlas
- 25
- 1
- 7
0
votes
1 answer
Fuzzy string matching using Difflib get_matching_blocks not detecting all substrings
I'm trying to find all occurrences of a word in paragraph and I want it to account for spelling mistakes as well. Code:
to_search="caterpillar"
search_here= "caterpillar are awesome animal catterpillar who like other humans but not other…
0
votes
0 answers
How to create a SequenceMatcher loop for 2 excel dataframes
Hej
I have currently 2 data rames from 2 different excel files
a=df_Web_Customer
b=df_Batchlog
Example
dfa = pd.DataFrame([[Casper May 16 2020], [Kasper Apr 1 2014], [Jonas Jan 15 2016]], columns=['Name'])
dfb = pd.DataFrame([[Casper May 16…

Shawn Atlas
- 25
- 1
- 7
0
votes
0 answers
Trying to match up two df's based on three common columns with none of them being identical
I have two df's
df1
date League teams
0 201902272215 brazil cup foz do iguacu fcceara ce
1 201902272300 colombia primera a deportes…

Joe
- 59
- 6
0
votes
0 answers
Speed up using numba in python using SequenceMatcher
Experiencing error when attempting to speed up with numba. Any other ways to speed up? Note "a" and "b" are pandas dataframe. I also have a gtx1070ti, any ways of utilising the gpu as well?
from difflib import SequenceMatcher
import time
z = []
x =…

Edward Liu
- 51
- 1
- 6
0
votes
0 answers
SequenceMatcher unable to distinguish between 'replace' and 'insert'
First example:
one = ['billy', 'sally', 'gd', 'kk', 'btb']
two = ['billy', 'sally', 'hh', 'kk', 'ff', 'btb']
opcodes1 = SequenceMatcher(None, one, two).get_opcodes()
opcodes2 = SequenceMatcher(None, two, one).get_opcodes()
correctly returns the…

Rhys
- 4,926
- 14
- 41
- 64
0
votes
0 answers
Best search algorithm to find 'similar' strings in excel spreadsheet
I am trying to figure out the most efficient way of finding similar values of a specific cell in a specified column(not all columns) in an excel .xlsx document. The code I have currently assumes all of the strings are unsorted. However the file I am…

Gary Frederick
- 38
- 4
0
votes
1 answer
difflib sequence matcher missing common substrings
In an attempt to find common substrings between two strings, SequenceMatcher does not return all expected common substrings.
s1 =…

rroutsong
- 45
- 5
0
votes
1 answer
get_matching_blocks of SequenceMatcher when match long string
in:
from difflib import SequenceMatcher
print('---------------------ksv in long…

Kevin liu
- 53
- 4
0
votes
1 answer
Sequence matcher in python based on priority sequence
I am trying to find closest match words from a list of stock name and I wan to put more priority to the word in front instead of word at back though the word at back may have more chars.
Eg.
"SG HOLDINGS" vs "S2 HOLDINGS"
sequence matcher will show…

Grace
- 1
0
votes
0 answers
The fastest way to compare items in a very large list in python
I've a very long list of tweets stored in a python list (more than 50k). I'm in the stage of comparing every item verses the rest to find the similarity between tweets by using difflib (to remove those who are 755 similar while just keeping one…