Questions tagged [poppler]

Poppler is a GPL'd PDF rendering library based on the xpdf-3.0 code base.

290 questions
8
votes
6 answers

How to display a pdf that has been downloaded in python

I have grabbed a pdf from the web using for example import requests pdf = requests.get("http://www.scala-lang.org/docu/files/ScalaByExample.pdf") I would like to modify this code to display it from gi.repository import Poppler, Gtk def…
marshall
  • 2,443
  • 7
  • 25
  • 45
8
votes
5 answers

Remove / Delete all images from a PDF using Ghostscript or ImageMagick

I want to delete / remove all the images in a PDF leaving only the text / font in the PDF with whatever command Line tool possible. I tried using -dGraphicsAlphaBits=1 in a Ghostscript command but the images are present but like a big pixel.
hussainb
  • 1,218
  • 2
  • 15
  • 33
7
votes
1 answer

converting pdf to image but after zooming in

This link shows how pdfs could be converted to images. Is there a way to zoom my pdfs before converting to images? In my project, i am converting pdfs to pngs and then using Python-tesseract library to extract text. I noticed that if I zoom pdfs and…
user2543622
  • 5,760
  • 25
  • 91
  • 159
6
votes
2 answers

Installing Poppler on cygwin

I just downloaded Poppler 0.16.5 and I am clueless on how to install this package on cygwin. Can anyone tell me whats the proper command in order to install poppler?
fogy
  • 61
  • 1
  • 2
6
votes
1 answer

'pdfseparate': Format output file name as page number with leading zeroes

pdfseparate requires to specify %d as PDF-page-pattern which is replaced by the page number. $ pdfseparate CFL_1115_ISSUU.pdf cfl-%d.pdf works. It sets separated output file names as cfl-1.pdf, cfl-2.pdf, ..., cfl-10.pdf etc. Now I need to add…
Amit Patel
  • 15,609
  • 18
  • 68
  • 106
6
votes
0 answers

pdfminer/poppler - how to set encoding

I have a file, i.e. http://www.agfl.cs.ru.nl/papers/manual28.pdf (it's english) Pdfminer and poppler shows the same result in most parsed pages, like: ¾º¿  ÒÙ Öݸ ¾¼¼ Ⱥ ¾º ÂÙÒ ¸ ¾¼¼ ź Ë ÙØØ Ö¸ Ǻ Ë It seems it can't read font custom encodings.…
night-crawler
  • 1,409
  • 1
  • 26
  • 39
6
votes
1 answer

Extracting text from highlighted annotations in a PDF file

Since yesterday I'm trying to extract the text from some highlighted annotations in one pdf, using python-poppler-qt4. According to this documentation, looks like I have to get the text using the Page.text() method, passing a Rectangle argument from…
tortov
  • 91
  • 2
  • 7
5
votes
3 answers

How can I build libpoppler from source?

I just download poppler to Linux system,and I want to incorporate it in my app to parse pdf file. (My goal is to convert pdf file to plain text.) How can I do this?
zxi
  • 195
  • 4
  • 17
5
votes
3 answers

Poppler programming

Poppler is a classic example of something without documentation that you would prefer be documented. This question is language agnostic, just asking about the general idea.. In short, how do you make a PDF viewer control with poppler? From what I…
foobar
  • 779
  • 3
  • 10
  • 18
5
votes
2 answers

How to config Font substitution in poppler

When convert pdf page to image, if a Font is not embedded in the input pdf, default Font substitution (usually Arial) is used. However, I want to change the default font. There is a description here but it is too few information. I don't know how to…
yuma4012
  • 61
  • 2
5
votes
1 answer

pdf2HtmlEX - Text on html is different than the source pdf

I am using to pdf2htmlEX in order to convert pdf files to html. I also extract the text from the file afterwards. The Problem: I encountered with a file that the text at the converted html is…
Montoya
  • 2,819
  • 3
  • 37
  • 65
5
votes
1 answer

using Poppler Qt4 c++

i need a pdf viewer library to be used in my app, am using c++ and QT i downloaded Poppler and code example The Poppler Qt4 interface library but, i do not know how to configure the library to work in my code. i am using QT Creater, in windows…
nemo
  • 51
  • 1
  • 2
5
votes
0 answers

pdftotext get font information (font-family, style, size)

I'm using "pdftotext -bbox file.pdf" to convert a pdf file into HTML. Here's a sample line from the output: foo Is there a way to get font information for every word…
5
votes
2 answers

dllimport /dllexport and static libraries compilation under visual c++

I desperatly need your help. Im trying to compile statically the poppler library (specially for qt4) on windows with the visual c++ 2008 compiler. To achieve this task I needed to compile a bunch of other libraries as dependencies for poppler…
5
votes
0 answers

How to Remove Mask or Corrupted Image from PDF?

I am working on a Ruby on Rails application to extract text and images from PDF files. While extracting images few of them get corrupted. Is there any way to identify those corrupted images after extraction? Anyone know why they get corrupted? I am…
sam
  • 372
  • 2
  • 12
1
2
3
19 20