0

im trying to create a simple script that will show me the difference (similar to github merging) by using difflib's HtmlDiff function.

so far ive gotten my pdf files together and am able to print their contents in binary using PyPDF2 functions.

import difflib
import os
import PyPDF2

os.chdir('.../MyPythonScripts/PDFtesterDifflib')

file1 = 'pdf1.pdf'
file2 = 'pdf2.pdf'

file1RL = open(file1, 'rb')
pdfreader1 = PyPDF2.PdfFileReader(file1RL)
PageOBJ1 = pdfreader1.getPage(0)
textOBJ1 = PageOBJ1.extractText()


file2RL = open(file2, 'rb')
pdfreader2 = PyPDF2.PdfFileReader(file2RL)
PageOBJ2 = pdfreader2.getPage(0)
textOBJ2 = PageOBJ2.extractText()

difference = difflib.HtmlDiff().make_file(textOBJ1,textOBJ2,file1,file2)

diff_report = open('...MyPythonScripts/PDFtesterDifflib/diff_report.html','w')
diff_report.write(difference)
diff_report.close()

the result is this: enter image description here

How can i get my lines to read normally? it should read: 1.apples 2.oranges 3. --this line should differ--

i am running python 3.6 on mac

Thanks in advance!

Cflux
  • 1,423
  • 3
  • 19
  • 39

0 Answers0