Questions tagged [sequencematcher]

For questions pertaining to SequenceMatcher from the python difflib module. This is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are hashable. difflib is part of the python standard library.

Documentation

72 questions
0
votes
1 answer

Comparing strings within two columns in pandas with SequenceMatcher

I am trying to determine the similarity of two columns in a pandas dataframe: Text1 All Performance results achieved by the approaches submitted to this Challenge. The…
user12809368
0
votes
0 answers

Tagging sentences with custom defined dictionary

I am trying to tag sentences using a custom defined dictionary. For example if I have two text files (1. My sentences, 2. My dictionary) Sentences file: I have abdominal pain and have difficulty breathing Dictionary file: Abdominal pain, difficulty…
0
votes
1 answer

Find Similar Elements in List using Python

I need to look for similar Items in a list using python. (e.g. 'Limits' is similar to 'Limit' or 'Download ICD file' is similar to 'Download ICD zip file') I really want my results to be similar with chars, not with digits (e.g. 'Angle 1' is similar…
0
votes
1 answer

Python How to loop sequence match Dataframes through specific columns and extra the rows

I have been trying the last 2 weeks to solve this problem, and i am almost at the goal. Case: Overall depiction of what i am trying I have 2 dataframes extracted from 2 different excel sheets for this example let us say 3x3 (DF1 and DF2) I want to…
Shawn Atlas
  • 25
  • 1
  • 7
0
votes
2 answers

How to iterate through 2 columns and match one by one

Lets say i have 2 excel files each containing a column of names and dates Excel 1: Name 0 Bla bla bla June 04 2018 1 Puppy Dog June 01 2017 2 Donald Duck February 24 2017 3 Bruno Venus April 24 2019 Excel 2: …
Shawn Atlas
  • 25
  • 1
  • 7
0
votes
1 answer

Fuzzy string matching using Difflib get_matching_blocks not detecting all substrings

I'm trying to find all occurrences of a word in paragraph and I want it to account for spelling mistakes as well. Code: to_search="caterpillar" search_here= "caterpillar are awesome animal catterpillar who like other humans but not other…
0
votes
0 answers

How to create a SequenceMatcher loop for 2 excel dataframes

Hej I have currently 2 data rames from 2 different excel files a=df_Web_Customer b=df_Batchlog Example dfa = pd.DataFrame([[Casper May 16 2020], [Kasper Apr 1 2014], [Jonas Jan 15 2016]], columns=['Name']) dfb = pd.DataFrame([[Casper May 16…
Shawn Atlas
  • 25
  • 1
  • 7
0
votes
0 answers

Trying to match up two df's based on three common columns with none of them being identical

I have two df's df1 date League teams 0 201902272215 brazil cup foz do iguacu fcceara ce 1 201902272300 colombia primera a deportes…
Joe
  • 59
  • 6
0
votes
0 answers

Speed up using numba in python using SequenceMatcher

Experiencing error when attempting to speed up with numba. Any other ways to speed up? Note "a" and "b" are pandas dataframe. I also have a gtx1070ti, any ways of utilising the gpu as well? from difflib import SequenceMatcher import time z = [] x =…
Edward Liu
  • 51
  • 1
  • 6
0
votes
0 answers

SequenceMatcher unable to distinguish between 'replace' and 'insert'

First example: one = ['billy', 'sally', 'gd', 'kk', 'btb'] two = ['billy', 'sally', 'hh', 'kk', 'ff', 'btb'] opcodes1 = SequenceMatcher(None, one, two).get_opcodes() opcodes2 = SequenceMatcher(None, two, one).get_opcodes() correctly returns the…
Rhys
  • 4,926
  • 14
  • 41
  • 64
0
votes
0 answers

Best search algorithm to find 'similar' strings in excel spreadsheet

I am trying to figure out the most efficient way of finding similar values of a specific cell in a specified column(not all columns) in an excel .xlsx document. The code I have currently assumes all of the strings are unsorted. However the file I am…
0
votes
1 answer

difflib sequence matcher missing common substrings

In an attempt to find common substrings between two strings, SequenceMatcher does not return all expected common substrings. s1 =…
rroutsong
  • 45
  • 5
0
votes
1 answer

get_matching_blocks of SequenceMatcher when match long string

in: from difflib import SequenceMatcher print('---------------------ksv in long…
Kevin liu
  • 53
  • 4
0
votes
1 answer

Sequence matcher in python based on priority sequence

I am trying to find closest match words from a list of stock name and I wan to put more priority to the word in front instead of word at back though the word at back may have more chars. Eg. "SG HOLDINGS" vs "S2 HOLDINGS" sequence matcher will show…
Grace
  • 1
0
votes
0 answers

The fastest way to compare items in a very large list in python

I've a very long list of tweets stored in a python list (more than 50k). I'm in the stage of comparing every item verses the rest to find the similarity between tweets by using difflib (to remove those who are 755 similar while just keeping one…