2

I have a folder where multiple clients upload multiple PDF files. Some of them are using embedded fonts, some doesn't.
I've been working on a service that optimizes (in terms of file size) the PDF files in this folder.
Each user may be uploading around 400 files, weighing anywhere between 80K to 10M, and my task is to optimize all of them to the smallest possible file size with minimal quality lose.

the PDF Library is doing a great job with it. My only problem is that I can't remove all embedded fonts from all files, since some of the files might use these fonts and the result would be a file that I can't use.

So my questions are:

  1. How can I detect what files use and what files doesn't use embedded fonts?
  2. When optimizing the files that use embedded fonts, How can I remove only the unused fonts?

what I want to achieve is to remove all embedded fonts from most of the files, but keep the embedded fonts in the files where I actually need them. I understand that it depends on the fonts I have on my system (these files should stay on a single system so portability is not that important to me), so I try to find a way to identify, before optimizing, what files will look OK without embedded fonts, and what files I need to keep the embedded fonts.

Zohar Peled
  • 79,642
  • 10
  • 69
  • 121
  • @mjwills Thanks for your suggestion. I guess it might be possible, but I was kinda hoping to avoid using multiple 3rd party classes for this. PDF Library is written by Adobe and already paid for so I was hoping I can get a solution based only on that... – Zohar Peled Aug 13 '17 at 13:32
  • @mjwills Sorry, but no. First, I'm looking for a way to remove unused embedded fonts, not to add embedded fonts (as the description in the comments of the link you provided suggest), and second, I Don't speak cpp... – Zohar Peled Aug 13 '17 at 13:36
  • I don't think the first one is available, but the second one is (I can get a list of embedded fonts). However, that's not the issue. From that list I want to know if there are embedded fonts are are not used in the document, so that I can remove them, or better yet, find out if it's safe to remove all embedded fonts. My goal is to minimize the file sizes, so if I can find out from what files it's safe to remove all the embedded fonts It can have a very dramatic impact on my output files. – Zohar Peled Aug 13 '17 at 13:51

2 Answers2

0

APDFL has a PDFontIsEmbedded() call. The DotNet interface's Font class has an Embedded property. Saving with the GarbageCollect SaveFlag should remove any unreferenced indirect objects, including fonts.

Note that Resource Dictionaries could potentially be shared by multiple pages so that fonts not used by one page might be used by another page that uses the same resource dictionary.

Patrick Gallot
  • 595
  • 3
  • 11
0

The Adobe PDF Library version 15 and up have a service that will optimize PDF files for you.

The Optimizer has a function to subset all embedded fonts. What that will do is create a subset of each font limited to only the glyphs of that font actually used by the document. The API is below.

void Datalogics::PDFL::PDFOptimizer::SetOption (OptimizerOption option, bool value)
void Datalogics::PDFL::PDFOptimizer::Optimize (Document document, string newPath)

This is the option that you need

SubsetAllEmbeddedFonts 
Vel Genov
  • 10,513
  • 2
  • 16
  • 19
  • Thanks, I'm already setting it to true, but in files that looks OK even after removing all embedded fonts, this option does not have the same dramatic effect of the file size as the `RemoveAllEmbeddedFonts` option. – Zohar Peled Aug 14 '17 at 15:18
  • I've edited my question to explain a little better what I want to do. Please check. – Zohar Peled Aug 14 '17 at 15:20