0

I have an SBI bank statement PDF which is tampered/forged. Here is the link for the PDF.

This PDF is edited using online editor www.ilovepdf.com. The edited part is the first entry under the 'Credit' column. Original entry was '2,412.00' and I have modified it to '12.00'.

Is there any programmatic way either using Python or any other opensource technology to identify the edited/modified location/area of the PDF (i.e. BBOX(Bounding Box) around 12.00 credit entry in this PDF)?

2 things I already know:

  1. Metadata (Info or XMP metadata) is not useful. Modify date of the metadata doesn't confirm if the PDF is compressed or indeed edited, it will change the modify date in both these cases. Also it doesn't give the location of the edit been done.

  2. PyMuPDF SPANS JSON object is also not useful as the edited entry doesn't come at the end of the SPANS JSON, instead it's in the proper order of the text inside the PDF. Here is the SPAN JSON file generated from PyMuPDF.

Kindly let me know if anyone has any opensource solution to resolve this problem.

Red
  • 26,798
  • 7
  • 36
  • 58
  • Perhaps there is a way to cryptographically sign the pdfs? For example, [iText seems to be available under AGPL and supports digital signatures](https://itextpdf.com/en/solutions/electronic-signatures-pdf) – xdhmoore Feb 23 '21 at 00:40
  • This seems relevant: https://stackoverflow.com/questions/935010/how-to-digitally-sign-a-pdfor-another-document-in-java – xdhmoore Feb 23 '21 at 00:47
  • As already answered to your previous question, it may be possible to identify primitive forgery by generic analysis but you won't recognize well-made forgery that way. For real world forgery detection you and the document sources have to start signing and verifying signatures. – mkl Feb 23 '21 at 05:54

1 Answers1

0

iLovePDF completely changes the whole text in the document. You can even see this, just open the original and the manipulated PDFs in two Acrobat Reader tabs and switch back and forth between them, you'll see nearly all letters move a bit.

Internally iLovePDF also rewrote the PDF completely according to its own preferences, and the edit fits in perfectly.

Thus, no, you cannot recognize the manipulated text based on this document alone because it technically is a completely different, a completely new one.

mkl
  • 90,588
  • 15
  • 125
  • 265