Is there any tool to render pdf document to image with partial content? For example, only text but no image and vectors, or only image and vectors but no text.
-
Does it need to be ghostscript or are you also prepared to do a little Java programming? – mkl Oct 06 '14 at 07:02
-
Any suggestion is welcomed. – Yu Liang Oct 06 '14 at 07:08
-
1The Apache Java library PDFBox contains code for rendering PDF pages (which is much improved in the current 2.0.0 development snapshot compared to the current 1.8.x releases). This code essentially calls the `PageDrawer` class. You can fairly simply tweak that class to only draw your choice of stuff. – mkl Oct 06 '14 at 07:25
2 Answers
The "traditional" way to do this would be to preprocess the PDF file, so that only the elements you want remain, and then rasterise the remaining file.
To give you an example, I've implemented PDF to iPad workflows where callas pdfToolbox (Watch out, I'm connected to this company) was used to split a PDF file in a text file and a "anything but text" file. Afterwards the "anything but text" file was rasterised and the two files where reassembled.
So regardless of the tool you want to use, I would see how that tool can preprocess the file to remove useless elements or how it can split off a file which is what you want. Then use the normal rasterisation capabilities of the tool.

- 6,602
- 2
- 28
- 41
With the Debenu Quick PDF Library you can do the extraction in two ways:
1.PDF2Image just text, without images
DPL.LoadFromFile("my_file.pdf", "");
int image_count = DPL.FindImages(); //number of embedded images
for(int i=0; i<=image_count; i++)
{
DPL.ClearImage(i); //clear the images
}
DPL.RenderageToFile(72, 1, 0, "just_text.bmp"); //save the file to image, without the images
Here is the list of the functions: http://www.debenu.com/docs/pdf_library_reference/ImageHandling.php
2.PDF2Image just text, without images
DPL.LoadFromFile("my_file.pdf", "");
DPL.GetPageText(3); //this returns CSV string with the cordinates of the text
//create new blank file
//XPos is the horizontal position of the text - get it from the CSV string
//YPos is the vertical position of the text - get it from the CSV string
//your_text is the text to draw - get it from the CSV string
DPL.DrawText(XPos, YPos, your_text);
DPL.RenderageToFile(72, 1, 0, "just_text.bmp"); //save the file to image, without the images

- 96
- 2