Cropping a PDF document using itext returns undesired output

Question

I have to crop a PDF document using itext but the resultant output pdf is not that for which i have provided the cordinates of rectangle to crop the same. I have uploaded the sample file on this path.

https://onedrive.live.com/redir?resid=445455D417418FDD%21123

onedrive.live.com/redir?resid=445455D417418FDD%21124

onedrive.live.com/redir?resid=445455D417418FDD%21125

onedrive.live.com/redir?resid=445455D417418FDD%21126

and i am using this code.

PdfReader reader = new PdfReader(docpath);
iTextSharp.text.Rectangle size = new iTextSharp.text.Rectangle(24, 144, 270, 348);
iTextSharp.text.Document document = new iTextSharp.text.Document(size);
string tempdocpath = docpath.Replace(".pdf", "_.pdf");
tempdocpath = tempdocpath.Replace(".PDF", "_.PDF");
PdfWriter writer = PdfWriter.GetInstance(document, new FileStream(tempdocpath, FileMode.Create, FileAccess.Write));
document.Open();
 PdfContentByte cb = writer.DirectContent;
 document.NewPage();

 PdfImportedPage page = writer.GetImportedPage(reader, pageNumber);
 cb.AddTemplate(page, 0, 0);
 document.Close();
 writer.Close();

score 1 · Answer 1 · answered Apr 12 '14 at 11:31

I don't understand your code sample, more specifically: I don't understand why you would crop pages using that code. Allow me to ignore your code, and to explain how pages can be cropped.

Take a look at the RotatePages example from my book. In the ManipulatePdf() method, I loop over the pages, I take the page dictionary, and I change the /Rotate key to rotate the page. That's not what you need, but the principle is similar.

You need to get the /MediaBox and /CropBox value from the page dictionary:

PdfArray mediabox = pageDict.getAsArray(PdfName.MEDIABOX);
PdfArray cropbox = pageDict.getAsArray(PdfName.CROPBOX);

In many cases, cropbox will be null in which case you can safely ignore it and use the mediabox value instead.

The cropbox value (or if null, mediabox) is an array with 4 values. These values represent two coordinates: one for the lower-left corner of the page, the other one for the upper-right corner of the page. If you want to crop a page, you need to change these coordinates and either replace the existing cropbox value (if one already exists) or add a new cropbox value (if there is none).

pageDict.put(PdfName.CROPBOX, new PdfArray(new float[]{llx, lly, urx, ury}));

Where llx, lly are the x and y coordinate of the lower-left corner and urx, ury are the x and y coordinate of the upper-right corner.

I am working on a publishing project and i need to crop the page because there are multiple articles on a page of a magazine and we have to crop out each article saprate from each page. The code which i have posted above works perfectly for all pdf pages but not for the one which i have uploaded above. what the wrong with this PDF. I request you to draw a rectangle on that PDF, Get its cordinates and put these cords in my code. Then please match the output pdf with one which you have drawn on the original PDF. then you will come to know my problem. — choudhary, Apr 12 '14 at 11:50
I'm sorry, you'll have to ask somebody else. I don't have the time for such a detailed view at your problem. — Bruno Lowagie, Apr 12 '14 at 11:52

Chris Haas · Answer 2 · 2014-04-14T13:46:59.517

1

Bruno's method is the proper method for cropping (he's the creator of iText, he would know). But since you've got a path you're trying to go down already I'll try to help you.

Instead of true cropping you are instead trying to create a new document at a specific size and then add the original document but shift it to fit your new "window". The end result is the same as cropping I guess.

One of the overloads to PdfContentByte.AddTemplate() is one that takes a transformation matrix. In your case you want to translate which is identified by [1, 0, 0, 1, tx, ty] the last two elements being what you need to figure out. For this specific PDF document you can use:

cb.AddTemplate(page, 1, 0, 0, 1, -36, -36);

EDIT

The magic number -34 should have actually been -36, sorry. That 36 represents the size of the document's various boxes (that Bruno was talking about) which shrink the document's viewable area by 36 from each size. Using the method you are trying to use you'll need to inspect the imported document's Crop box (maybe Bleed and Trim too?) and take that into account.

edited Apr 14 '14 at 13:46

answered Apr 12 '14 at 15:47

Chris Haas

53,986
12
141
274

You don't even need the `1, 0, 0, 1`, you can use `cb.AddTemplate(page, -34, -34);` The main problem with this approach versus mine, is that it throws away all interactivity: if the original document contains links, they will be gone; if it contains bookmarks, they will be gone; and so on. I up-voted the answer because it's not a wrong answer, but I'm tempted to down-vote the question because the OP is too stubborn to switch to using `PdfStamper` ;-) – Bruno Lowagie Apr 13 '14 at 08:04
Dear Chris, i just want to know, why i need to use this element (-34) for this document and not for others? why this value is not constant for each PDF. – choudhary Apr 14 '14 at 04:58
Thanks BrunoLowagie, I was staring right at that but not thinking! @choudhary, I updated the above with an answer to your question. – Chris Haas Apr 14 '14 at 13:48

Cropping a PDF document using itext returns undesired output

2 Answers2