0

I am trying to extract data from an invoice. I found invoice2data will do that job. I have pip installed invoice2data.

from invoice2data import extract_data

This is getting imported.

result = extract_data('sample.pdf')

When I run the above line it is showing that

OSError: pdftotext not installed. Can be downloaded from https://poppler.freedesktop.org/

When I try pip installing pdftotext it was showing virtual c++ 14.0 is required.I installed it using build tools. Again it was showing the same error. So I downloaded the files from https://pypi.org/project/pdftotext/ and pasted the extracted files in my anaconda/Lib/sitepackages directory. Now when I try to pip install pdftotext it is showing Requirement already satisfied: pdftotext in c:\users\vicky\anaconda3\lib\site-packages (2.1.2) Now, when I try to extract data from the pdf it is again showing the same error that pdftotext is not installed. How can I overcome this error or is there any other package that will satisfy my requirement?

Thanks in advance.

vicky
  • 249
  • 5
  • 16
  • I understand the error message to mean that you are not missing the libraries (they are the ones giving the error), but that the libraries need to interface with a binary that is separately installed. Have you tried the link? – oligofren Aug 17 '19 at 10:05
  • @oligofren. Which link should I try? – vicky Aug 17 '19 at 12:50
  • The link from the error message: https://poppler.freedesktop.org. it contains some software I believe will interface with the library. On the page it describes how other APIs interface with it. – oligofren Aug 19 '19 at 20:15

2 Answers2

2

Install poppler-utils before pdftotext

sudo apt-get install poppler-utils
Ankit Kumar Rajpoot
  • 5,188
  • 2
  • 38
  • 32
-1

some simple steps to do, which worked for me...

1. download and install Visual Studio with C++ Build Tools, as Microsoft Visual C++ is required. https://visualstudio.microsoft.com/downloads/

2. Download the latest binaries of Popplers for Windows https://blog.alivate.com.au/poppler-windows/index.html

3. Extract and copy the 'poppler' folder which is inside the folder 'include'

enter image description here

4. Past this 'poppler' folder inside the 'Anaconda3/include/' folder

enter image description here

5. Then run 'pip install pdftotext'

YOU ARE DONE!!!