I am using the following code for extracting images from pdf which is in PDFA1-a format but I am not able to get the images .
List<PDPage> list = document.getDocumentCatalog().getAllPages();
String fileName = oldFile.getName().replace(".pdf", "_cover");
int totalImages = 1;
for (PDPage page : list) {
PDResources pdResources = page.findResources();
Map pageImages = pdResources.getImages();
if (pageImages != null) {
InputStream xmlInputStream = null;
Iterator imageIter = pageImages.keySet().iterator();
while (imageIter.hasNext()) {
String key = (String) imageIter.next();
PDXObjectImage pdxObjectImage = (PDXObjectImage) pageImages.get(key);
System.out.println(convertStreamToString(xmlInputStream));
System.out.println(pdxObjectImage.hashCode());
System.out.println(pdxObjectImage.getColorSpace().getJavaColorSpace().isCS_sRGB());
pdxObjectImage.write2file(destinationDir + fileName+ "_" + totalImages);
totalImages++;
break;
}
}
}
I am able to extract images for notmal PDFs using above code but am not able to extract it for PDFA1-a format pdfs. It seems the following line
PDResources pdResources = page.findResources();
is not returning images I have even tried page.getResources() but still not getting any images.I have even tried to use itext but still it is not giving me any images.
If i try to convert the page of PDF to image using the following code
BufferedImage bufferedImage = page.convertToImage();
File outputfile = new File(destinationDir+"image1.JPEG");
ImageIO.write(bufferedImage, "JPEG", outputfile);
these images seem to have no metadata associated with them So I still am not able to know their dpi or whether they are color or grey scale.
Currently I am using PDFBox for doing this.I have already spent 2 days on this searching on google but still I havent found any code or documentation for doing this.
How to do this in java ??
Is it possible to get DPI or whether the pdf is color or black and white without extracting the images ??