Questions tagged [pdftotext]

Pdftotext converts Portable Document Format (PDF) files to plain text.

is a command-line utility for converting PDF files to plain text files—i.e. extracting raw text from PDF-encapsulated files.

pdftotext is freely available and included by default with many Linux distributions, and is also available for Windows as part of the Xpdf Windows port. Poppler, which is derived from Xpdf, also includes an implementation of pdftotext and included as part of the -utils package on most major Linux distributions.

However, there are also others CLI-based PDF text extraction tools with a similar or equal name. While they (for the most part) work in the same way, they may give different results. So, only us this tag for CLI-based pdftotext tools and variants and make sure to point out your specific version and environment.

Do not use this tag if you use a different extraction tool, i.e. a GUI-based PDF to text converter, an online PDF to Text converter, or another (commercial) tool.

367 questions
-1
votes
1 answer

itextpdf insert space beetwen 7 and dot after extract text

My problem describe this image http://185.49.12.119/~pogdan/7spacedot/7spacedot.jpg input file http://185.49.12.119/~pogdan/7spacedot/monitor_2016_99.pdf output file http://185.49.12.119/~pogdan/7spacedot/monitor_2016_99.txt all set files with…
pogdan
  • 11
  • 2
-1
votes
3 answers

Can't read pdf file

I'm trying to build an application that can read PDF files. I use this guide: http://www.codeproject.com/Articles/14170/Extract-Text-from-PDF-in-C-100-NET but do not understand what it means by "file" is the entire url from your computer. Because…
-1
votes
1 answer

Is there any library to help extract text from pdf from a rectangular region that can be used with PHP

I am looking for some (preferably free) library that can help extract PDF text from a specified rectangular region which is specified by left, top, width and height parameters. It should be usable with PHP on a linux system. Could you please suggest…
Raouf Athar
  • 1,803
  • 2
  • 16
  • 30
-2
votes
1 answer

How to convert Web PDF to Text

I want to convert web PDF's such as - https://archives.nseindia.com/corporate/ICRA_26012022091856_BSER3026012022.pdf & many more into a Text without saving them into my PC ,Cause 1000's of such announcemennts come up daily , Hence wanted to convert…
-2
votes
1 answer

Windows Batch Script to Rename PDF Files with it's First Line (Loop Possible)?

Is it possible to loop this to rename all the PDFs in a Folder using this code? I am not that great with Windows Batch Scripting at least in terms of Loops and Variable Setups. @echo off pdftotext "XYZ.pdf" rem set /p title=< "XYZ.txt": set /p…
-3
votes
2 answers

How to extract data from a particular area in a PDF file

See this pdf I want this data from this pdf "91815380284", "BeneficiaryName"=>"Kavita", "Gender" => "Female", "IDVerified" => "Aadhaar # XXXXXXXX3661", "BeneficiaryReferenceID" => "34684952644017", …
1 2 3
24
25