1

I looked for what opening a file with fitz do to the file, but didn't find anything. The code is simple:

import fitz
doc = fitz.open('a.pdf')
doc.save('b.pdf')

What I don't understand is why this will change the pdf size. With the file I tried, its size went from 829kb to 854kb.

I am not confortable with this because I would like to change a characteristic of a large number of files and I can't do it before being sure this won't alter them in any sense but in the characteristic I want to change.

BTW, what I want is just set the inner title of a pdf to be equal to the shown name of its file.

import fitz
doc = fitz.open(r'a.pdf')
doc.metadata['title']=None
doc.setMetadata(doc.metadata)
doc.save(r'b.pdf')

Can I asume I won't lose some information in this second example? Why the change in size when I just open and save the file in the first example?

Vishal Singh
  • 6,014
  • 2
  • 17
  • 33
José Chamorro
  • 497
  • 1
  • 6
  • 21

2 Answers2

0

As for me it helping with:

import fitz

doc = fitz.open(r'a.pdf')

# to clear metadata dict
doc.metadata = {}

# to clear all xml metadata
doc.del_xml_metadata()

# garbage=4 -- is cleaning duplications!
doc.save(filename=r'b.pdf',
         garbage=4)

Usually it's getting more than 30% less size.

garbage (int):

0 = none
1 = remove unused (unreferenced) objects.
2 = in addition to 1, compact the xref table.
3 = in addition to 2, merge duplicate objects.
4 = in addition to 3, check stream objects for duplication. This may be slow because such data are typically large.
Timur U
  • 415
  • 2
  • 14
  • You’re right! But according on the my examples, there’s are was off files with metadata trash. In that I am adding 3 code positions with dict, xml tags and garbage to understand tree ways to clean the file. + garbage documentation link – Timur U Aug 14 '23 at 13:08
-1

You should check the metadata of the document. It may have information on modification date, saving date, etc., that could explain the increased size.