I am trying to extract the background image of a PDF page to an SVG (using xpdf library). The problem I am facing is that the PDF contains additional images/graphics (presumably outside the cropbox) that are not rendered by PDF readers, but the corresponding SVG contains these images/graphics. I tried setting the viewBox attribute of the SVG to correspond to the cropBox bounds of that PDF page but the resulting SVG still displays some of the graphics objects that are not rendered by PDF. I also tried adding a clip path to the SVG - a rectangular clipping region (with bounds corresponding to PDF cropbox), but this too did not eliminate some of the additional graphics elements no seen in PDF. Any idea on what could be the problem? What is the right way to carry over PDF cropbox to SVG? Btw, the SVGs generated in both the cases mentioned above (viewbox and clipping region approaches) were fairly close in dimensions to the viewable area of the PDF page, and the additional elements were seen only close to the edges. Is it that cropbox dimensions obtained from PDF should not be used directly in SVG?
Asked
Active
Viewed 160 times
1
-
Post a sample PDF file and the corresponding SVG and we'll take a look at it. – iPDFdev Aug 27 '13 at 06:13
-
@iPDFdev Thanks! Here is the link to a zip file containing a sample PDF and the corresponding SVG file, as well as PNG files referred to by
elements of the SVG. You may notice that the SVG has some additional area at its right and bottom where a few graphic elements (line art) are present but these are not seen in the PDF: http://privatepaste.com/download/7510dd6ed8 – so2 Aug 27 '13 at 09:34 -
How did you convert the PDF file to SVG? I'm trying to find a correspondence between values in PDF and values in SVG and I cannot find any. – iPDFdev Aug 27 '13 at 11:16
-
@iPDFdev The difference in coordinates between PDF and SVG is because I translate the SVG paths so that the left top of the SVG corresponds to (0, 0). The clip paths too are translated (shifted) by the same delta values. Btw, based on some recent debugging results, it seems that the problem may be due to the fact that I am not using CTM matrix while creating the clip path corresponding to cropBox but I need to verify if using the CTM fixes the problem. – so2 Aug 27 '13 at 12:26
-
In the svg file (svg tag) you have these values: width="2518.387" height="3265.966". How did you actually compute them? – iPDFdev Aug 27 '13 at 12:46
-
@iPDFdev I compute width and height based on the size of the intersecting portion of the originally extracted SVG and the cropBox rectangle. But I later found that the cropBox rectanle I used wasn't correct because I wasn't applying the CTM (at least that is what I _guess_ was the root cause of the discrepancy). Thanks a lot for the help and prompt responses! Will update or comment on this question if I need further help. – so2 Aug 27 '13 at 15:09
1 Answers
0
Turns out that the problem was due to my code not transforming the PDF cropbox attribute (as given by xpdf) to user coordinates using CTM matrix (also obtainable through xpdf). After applying the transformation, the resulting SVG matches the rendered portion of the PDF page.

so2
- 322
- 2
- 13