For questions pertaining to SequenceMatcher from the python difflib module. This is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are hashable. difflib is part of the python standard library.
Questions tagged [sequencematcher]
72 questions
2
votes
2 answers
Is there an equivalent to pythons's SequenceMatcher in SQL Server to join on columns that are similar?
In python there a nice built in function that lets me check the difference between the sequence of two strings. Example below:
from difflib import SequenceMatcher
def similar(a, b):
return SequenceMatcher(None, a,…

CandleWax
- 2,159
- 2
- 28
- 46
2
votes
1 answer
Python Comparing text files for similar or equal lines
I have 2 text files, my goal is to find the lines in file First.txt that are not in Second.txt and output said lines to a third text file Missing.txt, i have that done:
fn = "Missing.txt"
try:
fileOutPut = open(fn, 'w')
except IOError:
…

Fidycent
- 21
- 4
2
votes
1 answer
Working of methods set_seq1 and set_seq2 , difflib python
I have checked the docs of difflib and i'm confused on how difflib.SequenceMatcher.ratio() actually works. Consider this :
s = difflib.SequenceMatcher(None, "hey here" , "hey there").ratio()
print s
gives s = 0.9411764705882353
I wanted to know…

Hypothetical Ninja
- 3,920
- 13
- 49
- 75
1
vote
2 answers
SequenceMatcher: Recording no match just once?
I am using SequenceMatcher to find a set of words within a group of texts. The problem I am having is that I need to record when it does not find a match, but one time per text. If I try an if statement, it gives me a result each time the comparison…

Connie
- 13
- 2
1
vote
1 answer
Print Rows that are "Near Duplicates" in Pandas DataFrame
I'm working on a personal project that performs Web Scraping on multiple databases of research articles (thus far I have done PubMed and Scopus) and extracts the titles of the articles. I've actually managed to pull this off on my own without…

TrevorM
- 55
- 6
1
vote
3 answers
Best way to recognize same club names that are written in a different way
for x in range(len(fclub1)-1):
for y in range(x+1,len(fclub1)-1):
if SequenceMatcher(None,fclub1[x], fclub1[y]).ratio() > 0.4:
if SequenceMatcher(None,fclub2[x], fclub2[y]).ratio() > 0.4:
…

Nikola Filipovic
- 11
- 1
1
vote
0 answers
How to write the output from a defined function related to opcodes into a new column in pandas dataframe
I am trying to write the output from a defined function in a new column in pandas dataframe & export it to excel, however when I open the excel I see blank values in the derived column.
Example & the code used is given below.
Dataframe name =…

Erfan
- 29
- 5
1
vote
1 answer
Using difflib to compare a string with a row in a dataframe
I have a string
email = 'joe@gmail.com'
and a DF
df = DataFrame({ ‘id’: [1, 2, 3], 'email_address': [‘steve@gmail.com’, ‘joe@hotmail.com’, ‘bill@hotmail.com’ ]})
I want to add a column named 'score' and score each email_address against my email…

RiotF
- 71
- 5
1
vote
0 answers
fuzzy wuzzy token sort vs difflib Sequence matcher
I am trying to figure out the difference between the two.
I get the same results(similarity scores) using the two for the same strings.
Can somebody please explain the difference between the two using the formula for each of them?
Any idea if one…

Samit Saxena
- 99
- 1
- 9
1
vote
3 answers
How to detect sequences in a interleaved log file
I would like to match patterns from a given pattern library, returning the longest detected patterns.
However I only have the interleaved result of multiple parallel tasks in a log file, e.g. from multiple cores of a processor.
Is this a known…

Sheldon
- 574
- 2
- 4
- 13
1
vote
1 answer
Finding all similar values in pandas using SequenceMatcher Python
I'm trying to filter on a specific value in pandas in a column but also allow for typing mistakes. I thought using SequenceMatcher was a good solution but I don't know what the best way is to apply it within a DataFrame. Let's say the headers are…

Hestaron
- 190
- 1
- 8
1
vote
1 answer
Drop similar text rows of one column in Python
import pandas as pd
from difflib import SequenceMatcher
df = pd.DataFrame({"id":[9,12,13,14],
"text":["Error number 609 at line 10", "Error number 609 at line 22", "Error string 'foo' at line 11", "Error string 'bar' at line…

ah bon
- 9,293
- 12
- 65
- 148
1
vote
2 answers
Find match percentage between two strings also taking intro consideration the order of the words - Python
I am looking for a way to output the match percentage while between two strings (ex: names) while also taking into consideration they might be the same but with the words in a different order.
I tried using SequenceMatcher() but the results are…

calin.bule
- 95
- 1
- 15
1
vote
2 answers
Difflib sequencematcher with sentences
I have the following dataframe
Column1 Column2
tomato fruit tomatoes are not a fruit
potato la best potatoe are some sort of fruit
apple there are great benefits to appel
pear peer
and I would like to look up the…

PRIME
- 73
- 1
- 3
- 10
1
vote
0 answers
Sequence clustering in R
I'm trying to write a simple R sequence clustering/grouping/simplification solution. I'm rather a beginner, haven't used R for a while, so please forgive simple and stupid questions/solutions. Tasks are taken from SAP and they represent execution of…

user2433705
- 141
- 1
- 10