I am using pdfbox to extract image and text from this pdf. I have following code for extraction of text:
PDFTextStripper p = new PDFTextStripper();
String thistext=p.getText(document);
Which extracts the text properly. However, when I try to extract images from the same pdf using ExtractImages
class, the images produced are all pages of the pdf, not the actual images. Is that because of the reason that the pdf might be a scanned copy? If that is true, how come the text is extracted?