Questions tagged [xpdf]

Xpdf is an open-source PDF viewer for the X Window System and Motif.

Xpdf is an open-source PDF viewer for the X Window System and Motif. Xpdf runs on practically any Unix-like operating system. Xpdf can decode LZW and read encrypted PDFs.

More details at http://en.wikipedia.org/wiki/Xpdf

71 questions
1
vote
1 answer

Specific version of pdftotext binary (old version of poppler-utils is not same version)?

Been digging for ages and struggling to find the answer. Have version 0.39 of a single binary pdftotext on our OSX dev systems (installed using brew install poppler. We cannot find other versions brew search poppler only has a single one. We are…
Ben
  • 1,292
  • 1
  • 13
  • 21
1
vote
2 answers

Installing pdftotext on Windows (for use with R, 'tm' package)

I am having trouble using R, 'tm' package, to read in .pdf files. Specifically, I try to run the following code: library(tm) filename = "myfile.pdf" tmp1 <- readPDF(PdftotextOptions="-layout") doc <-…
SuperUser01
  • 199
  • 1
  • 13
1
vote
1 answer

Using readPDF in R (tm package)

I'm a beginner at R and having a bit of trouble using the tm package. I need to extract specific data from page 55 through 300 of this and thought that R might be a good way to do so. (If anyone has a better idea, please let me know!) I did some…
JDY
  • 167
  • 2
  • 10
1
vote
1 answer

'pdftotext' errors encountered on Windows 7 -- same PDFs processed correctly under Linux

I have an old Linux version (0.12.4) of pdftotext that runs without problems, but I would like to run it on a Windows 7 machine. I downloaded the Windows installer for what appears to be the latest version, xpdf-2.03-bin.exe from…
LFleming
  • 21
  • 3
1
vote
2 answers

Using AJAX and PHP to output PDF

The way my web app is supposed to work is that the user fills out a form and then the AJAX sends the form data to a PHP file that generates a PDF (using xpdf). Then the generated PDF should be available for download on the HTML page with the AJAX.…
Ben Davidow
  • 1,175
  • 5
  • 23
  • 51
1
vote
1 answer

Discrepany between PDF cropbox and SVG created out of a PDF page

I am trying to extract the background image of a PDF page to an SVG (using xpdf library). The problem I am facing is that the PDF contains additional images/graphics (presumably outside the cropbox) that are not rendered by PDF readers, but the…
so2
  • 322
  • 2
  • 13
1
vote
1 answer

How to identify and extract vector graphics from PDF using xpdf library?

Does anyone have a sample code demonstrating how to extract vector graphics objects (such as those representing charts and flow diagrams) from a PDF using XPDF library? There doesn't seem to be any documentation available on the Web for xpdf library…
so1
  • 58
  • 1
  • 10
1
vote
0 answers

parsing pdf content stream to understand paragraph boundary

Is there a way to parse the pdf content stream and identify paragraph boundary? I read ISO 32000-1:2008 but could not understand if, the pdf content stream contains any operator which tells a display software to start the paragraph, or end it. Can…
rivu
  • 2,004
  • 2
  • 29
  • 45
1
vote
1 answer

why from scanned documents, text can be extracted, but not image

I asked a similar question before, in stackoverflow. I wanted to ask another related question, so I am rephrasing the original question again. I was using PDFBox to extract image and text from a pdf, available in skydrive and scribd. I had…
rivu
  • 2,004
  • 2
  • 29
  • 45
0
votes
0 answers

I can't get PDF document file path with PHP-XPDF

I have a Wordpress site installed on a VPS with Debian 11. One of the functionalities is reading uploaded PDF documents using the XPDF library and PHP wrapper PHP-XPDF: https://github.com/alchemy-fr/PHP-XPDF, which uses XPDFReader:…
weezle
  • 79
  • 1
  • 6
  • 15
0
votes
0 answers

pdftops to covert PDF to EPS in basic mode?

I'm using pdftops in a script to convert PDF to EPS. However looks that is not the "basic" EPS format, and I can't open it with Photopea, I'm getting this error: The command that I'm executing is this one: pdftops -eps -level2sep file.pdf…
Aral Roca
  • 5,442
  • 8
  • 47
  • 78
0
votes
2 answers

Converting pdf to text

I need to create a C# or C++ (MFC) application that converts pdf files to txt. I need not only to convert, but remove headers, footers, some garbage characters on the left margin etc. Thus the application shold allow the user to set page margins to…
dpreznik
  • 247
  • 5
  • 18
0
votes
1 answer

Make xpdf Pdf2Txt function as thread safe

I have tried to use xpdf source code into a MFC application to convert pdf to text. The code sample is taken from their site (or repository): int Pdf2Txt(std::string PdfFile, std::string TxtFile) const { GString* ownerPW, *userPW; …
Flaviu_
  • 1,285
  • 17
  • 33
0
votes
1 answer

Convert all text's color in PDF to black while ensuring text is selectable

Looking for ways to change the color of all text in a PDF to black using an open-source command-line tool (or package) while ensuring that text is rendered as text. Thanks to some answers on SO, found a command to convert the PDF to grayscale. gs -o…
qwertynik
  • 118
  • 2
  • 10
0
votes
1 answer

path should be string, bytes or os.PathLike, not InMemoryUploadedFile

In django I get the file uploaded by the user with input_pdf = request.FILES['pdf'] and I want to extract fiel text with pdftextract library with pdf = XPdf(input_pdf) but it gives an error: TypeError: _getfullpathname: path should be string, bytes…
Meysam
  • 105
  • 1
  • 1
  • 6