4

I'm currently using Docotic PDF library to write a compression program for a PDF file server hosting large scanned documents. (Intention is to get the smallest size in black and white that maintains a readable document- mostly legal briefs)

In testing I notice that certain files will respond better to JPEG compression while others respond better to Group3Fax or Flate. Is it possible to analyze the file and make an intelligent decision on which algorithm will produce the smallest PDF or would I actually have compress each file with all three algorithms and choose the smallest - which is incurs a ton of additional CPU overhead.

Any guidance is greatly appreciated. Thanks

bumble_bee_tuna
  • 3,533
  • 7
  • 43
  • 83
  • 1
    By black and white do you mean mono (1bpp), or greyscale? Also, are the documents mainly text or is there photographs? Are all the documents similar in appearance? – Ryan Aug 07 '18 at 06:28
  • @Ryan Legal Docs Greyscale 90% text – bumble_bee_tuna Mar 24 '21 at 01:10
  • 1
    Flate (zlib) is a decent general compression algo, but for images it should always be beat by DCT (jpeg) and CCITTFax in a PDF file. If you are going to go with JPEG/DCT then I would recommend highest quality, otherwise you will get artifacts around the straight edges of text. But I would suggest CCITTFax/Group3Fax compression, as a good balance. If you really just care about smallest... then I guess JPEG/DCT with really low settings, but text will look bad. – Ryan Mar 24 '21 at 21:21

0 Answers0