Questions tagged [difflib]

A python module, provides tools for computing and working with differences between sequences, especially useful for comparing text. Includes functions that produce reports using several common difference formats.

A python module which provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce difference information in various formats, including HTML and context and unified diffs.

341 questions
0
votes
1 answer

How to get the edited result by appending similar lines from "ndiff"?

I want to update the file1 from file2.I want to append the lines that are similar in both files.I got the comparison result using difflib.ndiff().How do I append the lines that are changed alone ? import difflib file1='file1.txt' …
0
votes
1 answer

get_matching_blocks() is ignoring some blocks if it matches first with blocks that come later

Here's the python code - import difflib x = "abxcd" y= "cdab" s = difflib.SequenceMatcher(None, x, y) for block in s.get_matching_blocks(): a=block[0:] if a[2]>0: m=a[0] n=a[0]+a[2] print (x[m:n]) It prints out only…
0
votes
1 answer

Trying to compare the contents of two excel files and save the difference by python

I have two excel files containing multiple lines of excel from a datalogger, and I need to compare the two files with 3 similar columns (anum,bnum,date,time) but with different column durations, and then save the difference into a third excel…
0
votes
2 answers

Can difflib be used to make a plagiarism detection program?

I am trying to figure this out... Can the difflib.* library in Python be used to make some kind of plagiarism detection program? If so how? Maybe anyone could help me to figure out this question.
Wenger
  • 1
  • 1
0
votes
1 answer

difflib sequence matcher missing common substrings

In an attempt to find common substrings between two strings, SequenceMatcher does not return all expected common substrings. s1 =…
rroutsong
  • 45
  • 5
0
votes
2 answers

Merging dataframes

I have been struggling with this problem all day. I have two dataframes as follows: Dataframe 1 - Billboards Dataframe 2 I would like to merge Dataframe 2 with Dataframe 1 based on song to end up with a dataframe that has SongId, Song, Rank and…
joe borg
  • 133
  • 1
  • 1
  • 7
0
votes
1 answer

pandas csv output has [''] when adding cutoff argument

I successfully added a cutoff option to get_close_matches in Pandas. For some reason when I add cutoff=0.7, when it outputs to my CSV, it reads as ['Name']. When it did not have the cutoff argument, it just outputted the match with no ['']. Below is…
0
votes
1 answer

Two closely matching files: get corresponding lines?

I'm in a situation where I'm programmatically generating LaTeX code, and I want my Synctex to point to the correct lines in the original file. The generation is basically doing template expansion, so the original files are nearly identical to the…
jmite
  • 8,171
  • 6
  • 40
  • 81
0
votes
0 answers

PyPDF2 difference resulting in 1 character per line

im trying to create a simple script that will show me the difference (similar to github merging) by using difflib's HtmlDiff function. so far ive gotten my pdf files together and am able to print their contents in binary using PyPDF2 functions.…
Cflux
  • 1,423
  • 3
  • 19
  • 39
0
votes
0 answers

Python DiffLib: get_close_matches does not find a value with ratio above cutoff

I'm experiencing an incoherent result using difflib.get_close_matches. I'm trying to find best matches of a string ('Adeline,L. Marie') in a list (['L. Marie,Adeline','Allain,Martine', 'Ndiaye,Marie', 'AdelaiDe Mori,Maria']) import difflib …
0
votes
0 answers

The fastest way to compare items in a very large list in python

I've a very long list of tweets stored in a python list (more than 50k). I'm in the stage of comparing every item verses the rest to find the similarity between tweets by using difflib (to remove those who are 755 similar while just keeping one…
0
votes
1 answer

Multiple Spelling Results in a Dataframe 1

I have some data containing spelling errors. I'm correcting them and scoring how close the spelling is using the following code: import pandas as pd import difflib Li_A = ["potato", "tomato", "squash", "apple", "pear"] Q = {'one' :…
R. Cox
  • 819
  • 8
  • 25
0
votes
1 answer

python3, difflib SequenceMatcher

the following takes in two strings, compares differences and return them both as identicals as well as their differences, separated by spaces (maintaining the length of the longest sting. The commented area in the code, are the 4 strings that should…
Rhys
  • 4,926
  • 14
  • 41
  • 64
0
votes
2 answers

difflib and removing lines even without + in front of them python

I'm relatively new to python and I am using difflib to compare two files and I want to find all the lines that don't match. The first file is just one line so it is essentially comparing against all the lines of the second file. When using difflib,…
AT2679
  • 1
0
votes
1 answer

Is there a reverse \n?

I am making a dictionary application using argparse in Python 3. I'm using difflib to find the closest matches to a given word. Though it's a list, and it has newline characters at the end, like: ['hello\n', 'hallo\n', 'hell\n'] And when I put a…
JBoy Advance
  • 78
  • 1
  • 13