Questions tagged [difflib]

A python module, provides tools for computing and working with differences between sequences, especially useful for comparing text. Includes functions that produce reports using several common difference formats.

A python module which provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce difference information in various formats, including HTML and context and unified diffs.

341 questions
3
votes
1 answer

difflib output is very strange, adding extra whitespace on each character

I'm playing around with difflib in Python and I'm having some difficulty getting the output to look good. For some strange reason, difflib is adding a single whitespace before each character. For example, I have a file (textfile01.txt) that looks…
erichar7
  • 89
  • 6
3
votes
1 answer

TypeError: object of type 'float' has no len() , difflib

I have a function that makes use of the value of a key in dictionary .The value is a list and i iterate over that list to compare it with my sample string. for item in List1: #iterate over objects of list [l3,l4] = dict2[item] #this just assigns…
Hypothetical Ninja
  • 3,920
  • 13
  • 49
  • 75
3
votes
1 answer

Python unified diff with line numbers from both "files"

I'm trying to figure out a way to create unified diffs with line numbers only showing N lines of context. I have been unable to do this with difflib.unified_diff. I need to show changes in both files. The closest I can come is using diff on the…
Aaron Meier
  • 929
  • 9
  • 21
3
votes
1 answer

Python: Passing SequenceMatcher in difflib an "autojunk=False" flag yields error

I am trying to use the SequenceMatcher method in Python's difflib package to identify string similarity. I have experienced strange behavior with the method, though, and I believe my problem may be related to the package's "junk" filter, a problem…
duhaime
  • 25,611
  • 17
  • 169
  • 224
3
votes
2 answers

How to sort list of strings by best match (difflib ratio)

Lets say I'm building a rudimentary search engine of sorts. I have a list of strings as the search results, and I want to order the list of search results with the best matching results at the top. My current code looks like this (named parameters…
chyyran
  • 2,446
  • 2
  • 21
  • 35
3
votes
3 answers

Python Difflib - How to Get SDiff Sequences with "Change" Op

I am reading the documentation for Python's difllib. According to the docs each, Differ delta gives a sequence Code Meaning '- ' line unique to sequence 1 '+ ' line unique to sequence 2 ' ' line common to both sequences '? ' line…
David Williams
  • 8,388
  • 23
  • 83
  • 171
3
votes
4 answers

Difflib.SequenceMatcher isjunk optional parameter query: how to ignore whitespaces, tabs, empty lines?

I am trying to use Difflib.SequenceMatcher to compute the similarities between two files. These two files are almost identical except that one contains some extra whitespaces, empty lines and other doesn't. I am trying to…
Graviton
  • 81,782
  • 146
  • 424
  • 602
2
votes
1 answer

Merge two Dataframes on two columns with different length by closest match

I want to merge these example dataframes: How to get the closest matches in a new df? df1: name age department DJ Griffin 27 FD Harris Smith 33 RD df2: name age department D.J. Griffin III …
Giskard
  • 65
  • 1
  • 8
2
votes
2 answers

comparing two .txt, difflib module tells me that a line is unique ('-') when in fact it is present in both .txt

I need help with difflib module. I'm using difflib (https://docs.python.org/3/library/difflib.html) to compare 2 txt from url, line by line, and find duplications and missing lines. difflib flag with a '-' each line that it's only unique in one of…
Sebastian
  • 27
  • 3
2
votes
1 answer

How to find and group similar terms in a dataframe in order to sum their values?

I have data like this: | Term | Value| | -------- | -----| | Apple | 100 | | Appel | 50 | | Banana | 200 | | Banan | 25 | | Orange | 140 | | Pear | 75 | | Lapel | 10 | Currently, I am using the following…
2
votes
1 answer

Python difflib gives bad results

I'm using the python difflib to calculate the diff between two plaintext English paragraphs. The paragraphs are very similar- one has an extra leading and ending sentence. There are also minor differences between the characters. Unfortunately, I'm…
Tim Lupo
  • 259
  • 2
  • 9
2
votes
1 answer

Azure pipeline ANSI colorcode support

I hope somebody can clarify this issue I am facing. I have an azure pipeline that its job is to compare 2 files and find differences, goes without saying that the pipeline works just fine and it does output the differences (I am using difflib). for…
Nayden Van
  • 1,133
  • 1
  • 23
  • 70
2
votes
0 answers

pyspark implementation of difflib.get_close_matches

Is there any pyspark equivalent function for difflib.get_close_matches. My Dataset is huge and wanted to compare and get the close match. I am not able to broadcast the compared dataset as it is not iterable.
Avinash
  • 127
  • 2
  • 13
2
votes
2 answers

Speeding up a comparison function for comparing sentences

I have a data frame that has a shape of (789174, 9). There is a column called resolution that contains a sentence that is less than 139 characters in length. I built a function to find sentences that have a similarity score of above 0.9 from the…
justanewb
  • 133
  • 4
  • 15
2
votes
2 answers

Extract words in a paragraph that are similar to words in list

I have the following string: "The boy went to twn and bought sausage and chicken. He then picked a tddy for his sister" List of words to be extracted: ["town","teddy","chicken","boy went"] NB: town and teddy are wrongly spelt in the given…
Herbert
  • 75
  • 8