3

I'm looking for a way to remove all path objects from PDF file.

I suspect that this can probably be done with javascript in Adobe Acrobat, but would really appreciate a tip to do it with ghostscript or mupdf tools.

Anyhow any working solution is acceptable as correct answer

theta
  • 24,593
  • 37
  • 119
  • 159

1 Answers1

6

To do this with Ghostscript you would have to modify the pdfwrite device. In fact you would probably have to do something similar for any PDF interpreter.

What do you consider a 'path' object ? A shfill for example ? How about text ? How about text using a type 3 font (which constructs paths) ?

What about clip paths ?

If you really want to pursue this I can tell you where to modify pdfwrite, provided you don't mind recompiling Ghostscript.

Its probably a dumb question, but why do you want to do this ? Is it possible there might be another solution to your problem ? If all you want to do is remove filled paths (or indeed stroked paths. One solution would be to run the file through ps2write to get PostScript, prepend code to redefine 'fill' and 'stroke' as no-ops, and then run the file back through pdfwrite to get a PDF.

[Added after reading comments]

PDF doesn't have a 'path' object, unlike XObject which is a type of object. Paths are created by a series of operations such as 'newpath', 'moveto', 'curveto' and 'lineto'. Once you have built a path you then operate on it with 'fill' or 'stroke. Note that PDF also doesn't have a 'text' object type either.

This is why your approach doesn't work, you can't remove 'path objects' because there aren't any, the paths are created in the content stream. You can use a Form XObject to do something similar, but then the paths construction is in the Form content stream, it still isn't a separate object.

The same is true of PostScript, these are NOT any kind of object oriented languages. You cannot ' detect vector object of type path' in either language because there are no objects. In practice anything which isn't an image is a vector object, and is constructed from a path (and with clipping, even some images might be considered as paths)

The piece of PostScript you have highlighted adds a rectangle to a path (paths need not be contiguous in either PDF or PostScript) and then fills it. Note that, as is usual practice in PostScript, these are not directly using the PostScript operators, but are executing procedures which use the operators. The procedures are defined in the program prologue.

By the way, it looks like you used the pswrite device here (can't be sure with such a small sample). If this is the case you really want to start with ps2write instead. Otherwise you are going to end up with an awful lot of things degenerating to tiny filled rectangles (pswrite does this with many image types)

I didn't suggest that you try to 'decrypt' the ps2write output (it isn't encrypted, its compressed).

What I suggested was to create a PostScript file, redefine the 'show' and/or 'fill' operators so that they do nothing, and then run the resulting PostScript program back through Ghostscript using the pdfwrite device. This will produce a PDF file where all stroked and/or filled objects are ignored.

[final addition]

I picked up your sample file and examined it.

I presume the bug you are seeing is that the PDF file uses a /Separation colour (surely it cannot fail to fill a rectangle) with an ICCBased alternate and no device space tint transfrom. In that case the current version of ps2write may solve your problem. It (currently, this is due to change) does not preserve /Separation colours and instead emits them as a device colour, by default RGB. So simply converting the files to PostScript and back to PDF may completely resolve your problem.

If you knew what the problem was, it would have been quicker if you had told us, I could have given you that information and work-around in the first place.

Using ps2write I then created a PostScript version of the file (notice that the Separation colours are now RGB) and prefixed the PostScript program with two lines:

/fill {newpath} bind def
/stroke {newpath} bind def

Note that you must use an editor which preserves binary. Then running that PostScript program back through Ghostscript using the pdfwrite device I obtain a PDF file where the green 'decoration' which I think you are having a problem with is gone.

So, there's a solution to your question, and a possibly better way to solve your problem as well.

KenS
  • 30,202
  • 3
  • 34
  • 51
  • Thanks Ken. I have couple of PDFs that cause problem on mobile renderer, because of these path objects. These are objects of type `path` as there are object of type `xobject`, `text`, etc. They are just some page decorations, and I hoped I can instruct ghostscript to iterate over every pdf object and if `path` object is detected then remove it. I'll try soon to convert sample file to postscript and use your tip or similar approach to handle this, then reply back. – theta Jan 19 '13 at 15:48
  • Ken, I convert sample page to ps and I found one part of vector path: [editor screenshot](http://i.imgur.com/bwsKmcW.png), but I can't see how I can detect vector object of type path. I also exported same page to ps2, but couldn't decrypt anything meaningful to me – theta Jan 19 '13 at 16:49
  • 1
    You may want to add some samples. Maybe those are broken path objects which simply would have to be repaired... – mkl Jan 19 '13 at 16:52
  • @mkl: if you think that can help in finding solution I uploaded uuencoded sample page: http://pastebin.com/raw.php?i=hXQHDN2V – theta Jan 19 '13 at 18:22
  • @theta which mobile renderers have problems here? I tried Adobe Reader and Polaris Office on Android, both made the PDF look ok. – mkl Jan 19 '13 at 23:22
  • @mkl, phone is rather old Symbian E61i, and I have 2 pdf readers on it that fail to deliver. I won't name the readers as I know and I have identified the problem. What I need and what this question asks and what would other potential user want to see by opening this question is - how can I remove all vector paths from a PDF file. Thanks – theta Jan 20 '13 at 00:59
  • @theta i would have preferred if i could reproduce the problem to identify the path operations which fail. As KenS hinted at, there different types of path operations, and removing some of them (especially clipping paths) can completely change the appearance of a PDF. I'll have a closer look at your sample in office tomorrow. – mkl Jan 20 '13 at 11:18
  • @theta Having had a better look at the PDF content now, I see additional non-trivial operators in your sample.pdf: Extended graphic states set nonzero overprint mode, and custom colour spaces are used, both ICCBased and Separation. Maybe those Symbian PDF renderers also have trouble with them? – mkl Jan 21 '13 at 08:50
  • Amazing... Thank you for detecting the root problem, and detailed explanation. Much appreciated. Thanks to @mkl too :) – theta Jan 21 '13 at 09:26