I am trying to extract data as HTML elements in python using pdfminer

Question

I am trying extract data as HTML from pdf using pdfminer although I was successful to extract text from the same pdf now I am getting an error while extracting data as HTML I have to filter the data further to categorize it in CSV. This is the script.

from io import StringIO  
from pdfminer.layout import LAParams  
from pdfminer.high_level import extract_text_to_fp  

output_string = StringIO  

with open('mini.pdf','rb') as fn:  
    extract_text_to_fp(fn, output_string, laparams=LAParams(), output_type='html', codec=None)

And this is the error I am getting. Click Here

What version of pdfminer and python are you running? I can't find this 'extract_text_to_fp' method that you import in the current distribution. To verify, run "pip show pdfminer". — Bastien Harkins, Aug 21 '20 at 14:16
Help us help you, edit your question to include the error as text, instead of linked image — Bastien Harkins, Aug 21 '20 at 14:20
I was able to solve the error I used a file object instead of output string and it helped. — Rajat Nagarkar, Aug 21 '20 at 17:51
I am using pdfminer.six and I was able to import it using just pdfminer. — Rajat Nagarkar, Aug 21 '20 at 17:52

score 2 · Answer 1 · answered Apr 17 '21 at 03:01

2

Add parentheses to StringIO this way: output_string = StringIO() that will call the class construction, and code could get working with this

answered Apr 17 '21 at 03:01

Mna

370
4
7

I am trying to extract data as HTML elements in python using pdfminer

1 Answers1