0

I have a pdf file which contains textboxes, radio buttons, check boxes etc. How do I extract all the data from the pdf using python? When i try using pdfminer or pypdf2, I am not able to scrape the data in textboxes.

Refer attached image.

enter image description here

For example: When i use pdfminer, I am able to scrae the "1) Program:" but not the value filled for it (i.e "EPIC_AFCS_AB139_7APD")

mkl
  • 90,588
  • 15
  • 125
  • 265

1 Answers1

0

First of all , you need to open the pdf file with "rb" format. Python perceives the pdf files as bytes.

let's assume name of the pdf file is "f", if you use f.read(10) command, the output will start with "b'/". Then you need to convert it into ascii or unicode.

Casca
  • 125
  • 7