Poppler is a GPL'd PDF rendering library based on the xpdf-3.0 code base.
Questions tagged [poppler]
290 questions
8
votes
6 answers
How to display a pdf that has been downloaded in python
I have grabbed a pdf from the web using for example
import requests
pdf = requests.get("http://www.scala-lang.org/docu/files/ScalaByExample.pdf")
I would like to modify this code to display it
from gi.repository import Poppler, Gtk
def…

marshall
- 2,443
- 7
- 25
- 45
8
votes
5 answers
Remove / Delete all images from a PDF using Ghostscript or ImageMagick
I want to delete / remove all the images in a PDF leaving only the text / font in the PDF with whatever command Line tool possible.
I tried using -dGraphicsAlphaBits=1 in a Ghostscript command but the images are present but like a big pixel.

hussainb
- 1,218
- 2
- 15
- 33
7
votes
1 answer
converting pdf to image but after zooming in
This link shows how pdfs could be converted to images. Is there a way to zoom my pdfs before converting to images? In my project, i am converting pdfs to pngs and then using Python-tesseract library to extract text. I noticed that if I zoom pdfs and…

user2543622
- 5,760
- 25
- 91
- 159
6
votes
2 answers
Installing Poppler on cygwin
I just downloaded Poppler 0.16.5 and I am clueless on how to install this package on cygwin. Can anyone tell me whats the proper command in order to install poppler?

fogy
- 61
- 1
- 2
6
votes
1 answer
'pdfseparate': Format output file name as page number with leading zeroes
pdfseparate requires to specify %d as PDF-page-pattern which is replaced by the page number.
$ pdfseparate CFL_1115_ISSUU.pdf cfl-%d.pdf works. It sets separated output file names as cfl-1.pdf, cfl-2.pdf, ..., cfl-10.pdf etc.
Now I need to add…

Amit Patel
- 15,609
- 18
- 68
- 106
6
votes
0 answers
pdfminer/poppler - how to set encoding
I have a file, i.e. http://www.agfl.cs.ru.nl/papers/manual28.pdf
(it's english)
Pdfminer and poppler shows the same result in most parsed pages, like:
¾º¿  ÒÙ Öݸ ¾¼¼ Ⱥ ¾º ÂÙÒ ¸ ¾¼¼ ź Ë ÙØØ Ö¸ Ǻ Ë
It seems it can't read font custom encodings.…

night-crawler
- 1,409
- 1
- 26
- 39
6
votes
1 answer
Extracting text from highlighted annotations in a PDF file
Since yesterday I'm trying to extract the text from some highlighted annotations in one pdf, using python-poppler-qt4.
According to this documentation, looks like I have to get the text using the Page.text() method, passing a Rectangle argument from…

tortov
- 91
- 2
- 7
5
votes
3 answers
How can I build libpoppler from source?
I just download poppler to Linux system,and I want to incorporate it in my app to parse pdf file.
(My goal is to convert pdf file to plain text.)
How can I do this?

zxi
- 195
- 4
- 17
5
votes
3 answers
Poppler programming
Poppler is a classic example of something without documentation that you would prefer be documented. This question is language agnostic, just asking about the general idea..
In short, how do you make a PDF viewer control with poppler?
From what I…

foobar
- 779
- 3
- 10
- 18
5
votes
2 answers
How to config Font substitution in poppler
When convert pdf page to image, if a Font is not embedded in the input pdf, default Font substitution (usually Arial) is used. However, I want to change the default font.
There is a description here but it is too few information. I don't know how to…

yuma4012
- 61
- 2
5
votes
1 answer
pdf2HtmlEX - Text on html is different than the source pdf
I am using to pdf2htmlEX in order to convert pdf files to html. I also extract the text from the file afterwards.
The Problem:
I encountered with a file that the text at the converted html is…

Montoya
- 2,819
- 3
- 37
- 65
5
votes
1 answer
using Poppler Qt4 c++
i need a pdf viewer library to be used in my app,
am using c++ and QT
i downloaded Poppler
and code example The Poppler Qt4 interface library
but, i do not know how to configure the library to work in my code.
i am using QT Creater, in windows…

nemo
- 51
- 1
- 2
5
votes
0 answers
pdftotext get font information (font-family, style, size)
I'm using "pdftotext -bbox file.pdf" to convert a pdf file into HTML.
Here's a sample line from the output:
foo
Is there a way to get font information for every word…

James Kroning
- 61
- 5
5
votes
2 answers
dllimport /dllexport and static libraries compilation under visual c++
I desperatly need your help.
Im trying to compile statically the poppler library (specially for qt4) on windows with the visual c++ 2008 compiler. To achieve this task I needed to compile a bunch of other libraries as dependencies for poppler…

Marco Antonio
- 73
- 1
- 5
5
votes
0 answers
How to Remove Mask or Corrupted Image from PDF?
I am working on a Ruby on Rails application to extract text and images from PDF files. While extracting images few of them get corrupted.
Is there any way to identify those corrupted images after extraction? Anyone know why they get corrupted?
I am…

sam
- 372
- 2
- 12