4

As part of a project I am realizing, there are given pdfdocuments which include forms as JPEG Images within A4 pages inside this documents. If have to extract those JPGs out of the PDF. Later on those JPGs are used to build PDF Documents again.

When I simply open up those Documents with any PDFViewer they seem to have no rotation at all, at least it is not visible. So like this icon the have vertical format.

vertical format Document has no visible rotation

but when I use this sample code to extract the images :

        PDDocument doc = PDDocument.load("/path/to/file);
    List pages = doc.getDocumentCatalog().getAllPages();
    Iterator iter = pages.iterator(); 

    int i = 0;

    while (iter.hasNext()) {
        PDPage page = (PDPage) iter.next();
        System.out.println(page.getRotation());
        System.out.println("ROTATION = " + page.getRotation());;
        PDResources resources = page.getResources();
        Map pageImages = resources.getXObjects();
        if (pageImages != null) { 
            Iterator imageIter = pageImages.keySet().iterator();
            while (imageIter.hasNext()) {
                String key = (String) imageIter.next();
                if(((PDXObjectImage) pageImages.get(key)) instanceof PDXObjectImage){
                    PDXObjectImage image = (PDXObjectImage) pageImages.get(key);

                    image.write2file("/path/to/file" + i);
                }
                i ++;
            }
        }
    }

all extracted JPGs are horizontal format. Further the sysout on the page.rotation tells me that the rotation is set to 270°.

enter image description here

How is that possible? 270 is set, but the PDF is shown vertical (I am no expert on PDF). I even did page.setRotate(0) before extracting the JPGs, but the images still remain horizontally. I read the following Thread telling how to rotate images before drawing them on the pdf. But i need to rotate them before writing them on the filesystem. What is the best way to achieve that?

Unfortunately, I can not attach any of the documents since they are confidential.

Community
  • 1
  • 1
Ilir
  • 430
  • 2
  • 7
  • 19
  • 1
    *How is that possible* - any image from the resources can be arbitrarily rotated, scaled, and skewed wherever it is used on the page. – mkl Apr 16 '15 at 21:59
  • 2
    While I haven't fully understood your question (its late here), individual images can be rotated regardless of the rotation of the page. (Which is what I explained in the question where you probably upvoted my answer, thanks) This is done in the PDF content stream with the help of the current transformation matrix. I recommend you have a look at the PrintImageLocations.java example. This will show you the angle. Combine that with the code you have to get the image in a BufferedImage, and then use the angle to write that into a new BufferedImage, and save that one into an image. – Tilman Hausherr Apr 16 '15 at 21:59
  • 1
    To explain this a bit more: it happens quite often that an image visible in a PDF has the "correct" angle, but that the raw image, when saved, looks like rotated 270 or 90°. So a tool can never know for sure which one is the "correct" look, the one from the raw image, or the one displayed in the PDF. – Tilman Hausherr Apr 16 '15 at 22:05
  • 1
    Possible explanation about the "why" for this: Mass scanning (with feeder) is often done in landscape mode because this is about 30% faster (compare the length of the edges of a piece of paper). So the orginal PDF creating application may have taken this scanned image, embedded it into a PDF and used a transformation matrix with a 90° or 270° angle to display it, so that it looks like a portrait document. – Tilman Hausherr Apr 16 '15 at 22:18
  • Thanks Tilman, your answers are bringing light into the darkness. But still some confusiion. When I use the PrintImageLocations.java, i get the following output ( I added a sysout for the page.rotation and the ANGLE ) i get the following output: Processing page: 3 PAGEROTATION = 270 Found image [Im4] ANGLE = 0.0 position = 0.0, 9.0 size = 3508px, 2480px size = 841.92, 595.2 size = 11.693333in, 8.266666in size = 297.01065mm, 209.97333mm So this tells me the page itself is rotated but not the image, or am I wrong? – Ilir Apr 17 '15 at 06:50
  • 2
    @llir Yes, this could be too. Everything could be rotated when displaying. Btw you wrote "page.setRotate(0) before extracting the JPGs", but this has no effect for extraction. What you should do, if the PDFs are from the same source so that you know what to expect, is to copy & rotate the bufferedimage like http://stackoverflow.com/questions/26021150/java-rotate-image or http://stackoverflow.com/questions/2257141/problems-rotating-bufferedimage . If you don't know what to expect, you could make a decision based on width vs heigh ratio. – Tilman Hausherr Apr 17 '15 at 09:54
  • Thanks a lot Tilman. Right, the files all will come from the same source and therefore I also thought about checking width vs ratio and then rotate based on that since I know the dimensions of the image. – Ilir Apr 17 '15 at 15:34

0 Answers0