0

I've spent some time on this issue looking through similar issues and documentation and cannot seem to shake this problem with this specific PDF. There's some images in this PDF that have a DeviceCMYK color space and they don't output correctly. I've inspected the PDF in iText Rups in the hopes of finding something useful. I suspected this was a transparency issue, however I cannot seem to find any reference to a second bitmap that'd be the mask layer. With that said, I'm unfamiliar with how this PDF was formed as it was one given to me from a colleague to test with.

I've tested using a PDF containing a CMYK JPEG found online with a color space of ICCBased and it works fine when being read and extracted to file or compressed. There is obviously something being missed in our PDF image extraction process, whether its a masking layer or an ICC profile i'm not sure. My efforts to debug this have not yielded a lot of helpful information, however I was hoping someone could point me in the right direction with their experience with this issue.

Note: I'm using the ImageIO Plugin TwelveMonkeys to allow support for CMYK JPEG images.

The below images are a screen-grab of the image as it appears in the PDF. The second is the output when extracted using iText 5.

EDIT: updated the second (dark version) image to be the jpeg produced after extracting from the PDF and not a PNG file. Have also added a screenshot of the PDF inspector for the page containing the example image. screenshot of expected image image with dark output

screenshot

camohreally
  • 149
  • 3
  • 4
  • 17
  • 1
    how do you extract the images exactly? – mkl Apr 09 '18 at 07:35
  • 1
    I'm pretty sure this is due to the fact that CMYK JPEG streams in PDF *can* be stored as plain CMYK values, while standalone CMYK JPEG files by convention always store inverse CMYK values. You can see that your image is mostly black, while the parts that should actually be black/dark have color. The problem here, is that the setting is stored in the PDF control structure, outside the JPEG stream. There's no (standard) way to carry this information into a standalone JPEG file. So if you just extract the JPEG stream from the PDF, this will happen. You need to convert the image while extracting. – Harald K Apr 09 '18 at 08:09
  • Related: https://graphicdesign.stackexchange.com/questions/12894/cmyk-jpegs-extracted-from-pdf-appear-inverted – Harald K Apr 09 '18 at 08:36
  • Sorry, didn't realise i uploaded one converted to PNG not the JPEG, have updated now. If you take a look at the PDF i hyperlinked and inspect the dictionaries (using Rups or something), i cannot see a Decode array anywhere unfortunately so I can't test that related question's suggestion. – camohreally Apr 09 '18 at 23:13
  • @haraldK I'll also add that I turned on the Debug property for the JPEGImageReader plugin and receive this output when it reads the above image: "Read metadata in 0 ms" "Reading using raster and extra conversion" "ICC color profile: null". – camohreally Apr 09 '18 at 23:50
  • @mkl The images are extracted using iText 5 in Java. The PDF content stream is read into a PdfImageObject, the image bytes are decoded using PdfReader and then ImageIO reads and writes the JPEG image. – camohreally Apr 10 '18 at 00:14
  • I can confirm that the problem is indeed inverted CMYK values. If I just comment out the inversion in the `JPEGImageReader` (there's no API for that at the moment), the image is read just fine. You can achieve the same effect by using the `ImageReader.readRaster` method, and construct an image from that, using `ColorModel` based on a generic CMYK profile. – Harald K Apr 11 '18 at 07:50
  • @haraldK Interesting, thank you for figuring that out. I assumed that because there was no Decode array (like was talked about in the related question you linked) that no inversion would have occured. – camohreally Apr 11 '18 at 22:40

0 Answers0