Can't import pdftotext in python in my Mac M1

Question

I can't import pdftext in my new mac M1. The steps I took are:

Install python 3.10
Install command line developer tools
pip3 install pdftotext from terminal
Open IDLE, type import pdftotext
I get this error:

Traceback (most recent call last): File "<pyshell#9>", line 1, in import pdftotext ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pdftotext.cpython-310-darwin.so, 0x0002): symbol not found in flat namespace '_ZN7poppler24set_debug_error_functionEPFvRKNSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEEPvES9'

I have already spent a few hours searching for this error message.

Any suggestions?

PS: I have tried several other pdf -> text packages, but they don't read the full pdf. For some weird reason, the pdfs I need to read are really complex and many packages don't read them fully. pdftotext does. So what I need is help to make this pdftotext work.

My guess is that this is a problem with the native code portion of the library. Have you checked the site for `pdftotext` to see if it is stated if the library should work on Apple silicon? You might want to find a forum specific to the package and post this question there. — CryptoFool, Mar 06 '22 at 18:01
Thanks for the suggestion: I have just posted a new issue in the package site https://pypi.org/project/pdftotext/ — Antonio, Mar 06 '22 at 18:36

score -1 · Answer 1 · answered Mar 06 '22 at 16:05

i dont think pdftotext good library. use PyPDF2 its better and here is example

import PyPDF2
 
#create file object variable
#opening method will be rb
pdffileobj=open('1.pdf','rb')
 
#create reader variable that will read the pdffileobj
pdfreader=PyPDF2.PdfFileReader(pdffileobj)
 
#This will store the number of pages of this pdf file
x=pdfreader.numPages
 
#create a variable that will select the selected number of pages
pageobj=pdfreader.getPage(x+1)
 
#(x+1) because python indentation starts with 0.
#create text variable which will store all text datafrom pdf file
text=pageobj.extractText()
 
#save the extracted data from pdf to a txt file
#we will use file handling here
#dont forget to put r before you put the file path
#go to the file location copy the path by right clicking on the file
#click properties and copy the location path and paste it here.
#put "\\your_txtfilename"
file1=open(r"C:\Users\SIDDHI\AppData\Local\Programs\Python\Python38\\1.txt","a")
file1.writelines(text)

Thanks a lot. However, PyPDF2 does not read all the text in the PDF. It misses a lot of text. That is why I chose pdftotext. It would help if you know how to make pdftotext work in a Mac M1. — Antonio, Mar 06 '22 at 17:45
Recommending a tool or library is out of scope for SO. This also doesn't answer the question that was asked. — CryptoFool, Mar 06 '22 at 17:56

Can't import pdftotext in python in my Mac M1

1 Answers1