Questions tagged [difflib]

A python module, provides tools for computing and working with differences between sequences, especially useful for comparing text. Includes functions that produce reports using several common difference formats.

A python module which provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce difference information in various formats, including HTML and context and unified diffs.

341 questions
7
votes
1 answer

Python Diff Two Multiline Strings Like GitHub

I want to achieve a diff output like github's commit diff view. And I tried this: import difflib first = """ def baz """ second = """ deff ba bar foo """ diff = '' for text in difflib.unified_diff(first, second): for prefix in ('---', '+++',…
user8108383
7
votes
2 answers

difflib.SequenceMatcher isjunk argument not considered?

In the python difflib library, is the SequenceMatcher class behaving unexpectedly, or am I misreading what the supposed behavior is? Why does the isjunk argument seem to not make any difference in this case? difflib.SequenceMatcher(None, "AA", "A…
bluelogic
  • 71
  • 3
7
votes
6 answers

Better fuzzy matching performance?

I'm currently using method get_close_matches method from difflib to iterate through a list of 15,000 strings to get the closest match against another list of approx 15,000 strings: a=['blah','pie','apple'...] b=['jimbo','zomg','pie'...] for value…
7
votes
1 answer

Difflib's SequenceMatcher - Customized equality

I've been trying to create a nested or recursive effect with SequenceMatcher. The final goal is comparing two sequences, both may contain instances of different types. For example, the sequences could be: l1 = [1, "Foo", "Bar", 3] l2 = [1, "Fo",…
YaronK
  • 782
  • 1
  • 7
  • 14
7
votes
2 answers

Approximate string matching of author names - modules and strategies

I've created a small program that checks if authors are present in a database of authors. I haven't been able to find any specific modules for this problem, so I'm writing it from scratch using modules for approximate string matching. The database…
Misconstruction
  • 1,839
  • 4
  • 17
  • 23
7
votes
1 answer

Is it possible that the SequenceMatcher in Python's difflib could provide a more efficient way to calculate Levenshtein distance?

Here's the textbook example of the general algorithm to calculate Levenshtein Distance (I've pulled from Magnus Hetland's webite): def levenshtein(a,b): "Calculates the Levenshtein distance between a and b." n, m = len(a), len(b) if n >…
damzam
  • 1,921
  • 15
  • 18
6
votes
1 answer

In python, produce HTML highlighting the differences of two simple strings

I need to highlight the differences between two simple strings with python, enclosing the differing substrings in a HTML span attribute. So I'm looking for a simple way to implement the function illustrated by the following…
user1069609
  • 863
  • 5
  • 16
  • 30
6
votes
2 answers

SequenceMatcher - finding the two most similar elements of two or more lists of data

I was trying to compare a set of strings to an already defined set of strings. For example, you want to find the addressee of a letter, which text is digitalized via OCR. There is an array of adresses, which has dictionaries as elements. Each…
valerius21
  • 423
  • 5
  • 14
6
votes
1 answer

Why does unified_diff method from the difflib library in Python leave out some characters?

I am trying to check for differences between lines. This is my code: from difflib import unified_diff s1 = ['a', 'b', 'c', 'd', 'e', 'f'] s2 = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'i', 'k', 'l', 'm', 'n'] for line in unified_diff(s1, s2): print…
Shivani
  • 105
  • 1
  • 9
6
votes
3 answers

Python's difflib SequenceMatcher speed up

I'm using difflib SequenceMatcher (ratio() method) to define similarity between text files. While difflib is relatively fast to compare a small set of text files e.g. 10 files of 70 kb on average comparing to each other (46 comparisons) takes about…
user734094
6
votes
0 answers

Ignoring whitespace in a python diff

Is there an elegant way to ignore whitespace in a diff in python (using difflib, or any other module)? Maybe I missed something, but I've scoured the documentation, and was unable to find any explicit support for this in difflib. My current solution…
Max Wallace
  • 3,609
  • 31
  • 42
5
votes
2 answers

Compare 2 large CSVs using python - output the differences

I am writing a program to compare all files and directories between two filepaths (basically the files metadata, content, and internal directories should match) File content comparison is done row by row. Dimensions of the csv may or may not be the…
5
votes
1 answer

Using difflib.diff_bytes to compare two files in python

Let's say I want to compare file a and file b with the difflib.diff_bytes function, how would I do this? Thanks
goldfarb33
  • 65
  • 2
  • 6
5
votes
1 answer

match changes by words, not by characters

I'm using difflib's SequenceMatcher to get_opcodes() and than highlight the changes with css to create some kind of web diff. First, I set a min_delta so that I consider two strings different if only 3 or more characters in the whole string differ,…
user5164080
5
votes
2 answers

how to get multiple matches with difflib.SequenceMatcher?

I am using difflib to identify all the matches of a short string in a longer sequence. However it seems that when there are multiple matches, difflib only returns one: > sm = difflib.SequenceMatcher(None, a='ACT', b='ACTGACT') >…
dalloliogm
  • 8,718
  • 6
  • 45
  • 55
1
2
3
22 23