Questions tagged [xpdf]

Xpdf is an open-source PDF viewer for the X Window System and Motif.

Xpdf is an open-source PDF viewer for the X Window System and Motif. Xpdf runs on practically any Unix-like operating system. Xpdf can decode LZW and read encrypted PDFs.

More details at http://en.wikipedia.org/wiki/Xpdf

71 questions
2
votes
2 answers

Getting the RIGHT word count of a PDF file

The response in this topic helped me understand why sometimes my PDF fails to find a word and why I keep getting different word counts when using different PDF word count programs. I decided to use xpdf. I converted it to text and added the -layout…
2
votes
1 answer

How can I use pdftools in R to convert a large batch of PDF files to TXT files?

I'm trying to extract ~600 pdf files filled with tables to text format so I can do some data exploration. It looks like pdftool is my best bet to get the job done but the help files are brief. The closest tutorial I found uses xpdf. Is there a way…
adarvishian
  • 175
  • 1
  • 3
  • 11
2
votes
1 answer

PDFtoTEXT not converting UTF-8 encoded text completely, especially the accented characters

I am working on a project which requires to convert PDF to text. The PDF contains Hindi fonts (Mangal to be specific) along with English. 100% of english is getting converted into text. The conversion of Hindi part is around 95%. Remaining 5% Hindi…
Dian
  • 41
  • 3
2
votes
1 answer

Batch file to convert all pdf to text (with xpdf)

I would like to run a batch conversion in a folder with full of pdf files. I have using xPDF and this is the command prompt for a single file: c:\Test\pdftotext -layout firstpdftoconvert.pdf firstpdfconverted.txt Could somebody help please to do it…
Ismo
  • 47
  • 1
  • 3
  • 7
2
votes
1 answer

Xpdf pid missing in Awesome

I'm trying to do some magic with Awesome, generally I suffer from missing Xpdf pid. When I have any other window I'm testing, the client.pid field is OK, and contains the window's pid. However with Xpdf this field is always set to 0. Is there any…
Szymon Lipiński
  • 27,098
  • 17
  • 75
  • 77
2
votes
1 answer

How to identify which clip paths apply to a path or fill in PDF vector graphics?

I am trying to extract vector graphics from a PDF file and create corresponding SVG files. I am using SVGOutputDev (https://github.com/immateriel/pdf2svg/blob/master/SVGOutputDev.cc‎) with xpdf library for this purpose. Now SVGOutputDev hasn't…
so2
  • 322
  • 2
  • 13
2
votes
1 answer

pdftotext on Centos 6 64bit?

I have a HostGator VPS server, and want to be able to run pdftotext, part of xpdf (http://www.foolabs.com/xpdf/download.html). After testing this out on my Mac, it worked fine, so I went to installing it on my VPS server. I followed the installation…
Charlie
  • 11,380
  • 19
  • 83
  • 138
2
votes
3 answers

shell_exec() not executing pdftotext command

I installed the required library and its working in terminal but not in my php file. My code is : $mypdf = shell_exec('/usr/local/bin/pdftotext test.pdf test.txt'); echo $mypdf; If I execute this command /usr/local/bin/pdftotext test.pdf test.txt…
user1360768
1
vote
3 answers

how to extract texts from PDFs using xpdf?

I have many PDFs in a folder. I want to extract the text from these PDFs using xpdf. For example : example1.pdf extract to example1.txt example2.pdf extract to example2.txt etc.. here is my code :
bruine
  • 647
  • 5
  • 16
1
vote
1 answer

xpdf (pdftotext) with language pack call from different directory

I am experimenting with xpdf (pdftotext) on a macOS Terminal. I use one language package (Japanese). Everything works fine if I call the executable like this (from the lib directory): lib kelly$ ./p2t -enc UTF-8 jp.pdf and my data structure…
Kelly o'Brian
  • 415
  • 3
  • 12
1
vote
0 answers

How to generate pdftotext same as pdf generated by xpdf in Laravel?

I am using spatie library of Laravel to convert pdf into text. I am using Xpdf. This is my code for conversion of pdftotext. $text1 = (new…
amit sutar
  • 541
  • 2
  • 11
  • 37
1
vote
1 answer

pdftoppm converts only one first page of pdf

I need to convert pdf to pgm, and when I run the (example)command pdftoppm -f 5 -l 10 -gray input.pdf > output.pgm I am getting the first page of the pdf as output. This is even though I am clearly specifying first page as page 5. I am not…
Kalpit
  • 891
  • 1
  • 8
  • 24
1
vote
0 answers

xpdf returns error (status 127) on Windows with R

I have downloaded and stored xpdf files to a directory in my C: disk: Moreover, I have included the path to the Environment paths: However, I still get an error message:
user8270077
  • 4,621
  • 17
  • 75
  • 140
1
vote
1 answer

How can I check if PDF page is image(scanned) by PDFBOX, XPDF

PDFBox problem on extract images. Hi, how I can check if pdf page is image and to extract that by PDFBOX library, there is a method to get images but if PDF Page is a Image it is not getting. could some one help me to solve this problem. Xpdf…
dmitri
  • 460
  • 5
  • 11
1
vote
2 answers

How to differentiate between "text" PDFs and "image" PDFs in PHP?

I've recently set up a Linux server to be able to convert text-based PDFs to text by using the pdftotext command that's part of Xpdf as well as to convert image-based PDFs to text by using a combination of the gs (Ghostscript) and tesseract…
HartleySan
  • 7,404
  • 14
  • 66
  • 119