Questions tagged [sequencematcher]

For questions pertaining to SequenceMatcher from the python difflib module. This is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are hashable. difflib is part of the python standard library.

Documentation

72 questions
1
vote
1 answer

How to delete invalid characters between multiple strings in python?

I'm working in a project with OCR in Spanish. The camera captures different frames in a line of text. The line of text contains this: Este texto, es una prueba del dispositivo lector para no videntes. After some operations I get strings like…
Alex Ortega
  • 45
  • 11
1
vote
0 answers

Custom items for list alignment with SequenceMatcher

I am using SequenceMatcher for aligning two lists. Each lists' item is either tuple or integer. The requirement is, for a tuple that contains a particular integer is considered as equal. For example: (1, 2, 3) == 1 #True (1, 2, 3) == 2 #True To do…
jalal
  • 83
  • 1
  • 7
0
votes
1 answer

How to return the most match value via SequenceMatcher

I have to match a product's category name returned from API response and product's category name from data base. For example: api_category = "packing tape", category names from DB = ["packing material", "packaging equipment"] from difflib import…
0
votes
0 answers

Python SequenceMatcher (difflib) not providing correct results for delete tag

I'm using SequenceMatcher to compare the output of usernames from an API list and an LDAP group. The intent is to add, and separately, remove users. I've got the 'add' part working. I can't get the 'remove' part to give me the correct list of…
Dan
  • 97
  • 1
  • 7
0
votes
1 answer

How can I compare one column of a dataframe to multiple other columns using SequenceMatcher?

I have a dataframe with 6 columns, the first two are an id and a name column, the remaining 4 are potential matches for the name column. id name match1 match2 match3 match4 id name …
Sammyg
  • 1
  • 1
0
votes
0 answers

Difflib Sequence Matcher Algorithm

SequenceMatcher is a class available in python module named 'difflib.' It can be used for comparing pairs of input sequences. I'm writing a research paper for which I need the steps of the actual algorithm being used for this class. According to the…
Hamza
  • 65
  • 5
0
votes
1 answer

How to perform sequence matcher on dataframe values in a row in Python?

New to Python, so kind of figuring things out. I have a dataframe from an excel spreadsheet. Something like this: MANUFACTURER MANUFACTURER PART NUMBER…
Jace
  • 27
  • 5
0
votes
0 answers

Difflib.SequenceMatcher not working in "IF" Statement?

I am executing a code with SequenceMatcher (Difflib library) nested in an "IF" Statement like this: ''' from difflib import SequenceMatcher string_one = 'He is right' string_two = 'He was right' print("It returns a ratio",…
0
votes
0 answers

SequenceMatcher ratio return

For example, a = 'OrangeApple' and b = 'AppleOrange', after running SequenceMatcher(None, a, b).ratio() the returned ratio (similarity score) is 0.54. If a = 'OrangeApple' and b = 'OrangeApple' the returned ratio is, as expected, 1. I somehow…
0
votes
0 answers

Compare two text columns to measure their similarity in a dataframe in python

I want to compare columns A with C and also B with C and measure each pair's similarity and then report the one that has a higher degree of similarity. df = pd.DataFrame([['JAMES LIKEN', 'LINDEN R. EVANS', 'LINDEN R. EVANS'], ['HENRY THEISEN',…
0
votes
1 answer

Does SequenceMatcher is supported by chaquopy

does chaquopy support from difflib import SequenceMatcher or pip will be install first and what pip will be used to use the SequenceMatcher
0
votes
1 answer

How i match with best ratio of SequenceMatcher

I use the SequenceMatcher ratio to match two dataframe with the best ratio. I want to check first if the score A and AA is good then check if the score between B is BB is good then if the score between C and CC is good, then I add the line …
0
votes
2 answers

Find common fragments in multiple strings using SequenceMatcher

I would like to find common string between: strings_list = ['PS1 123456 Test', 'PS1 758922 Test', 'PS1 978242 Test'] The following code returns only the first part "PS1 1", I would imagine the result is "PS1 Test". Could you help me, is it possible…
Elka
  • 3
  • 2
0
votes
1 answer

Similarity ratio from a list of excluded strings

In comparing the similarity of 2 strings, I want to exclude a list of strings, for example, ignore 'Texas', and 'US'. I tried to use the argument 'isjunk' in Difflib's SequenceMatcher: exclusion = ['Texas', 'US'] sr = SequenceMatcher(lambda x: x in…
Mark K
  • 8,767
  • 14
  • 58
  • 118
0
votes
0 answers

Comparing strings in python with tools as SequenceMatcher and textdistance and the difference in their algorithms

I am working with a dataframe which has 2 columns of city names which should be equal. But they are not due to administrative errors, spelling mistakes or name changes. I am trying to see when those city names are 'equal enough' to be assumed equal.…
Hestaron
  • 190
  • 1
  • 8