0

This is the Example of the pdf from which i want to extraxct data.

Sample of pdf

This is what i have tried

import PyPDF2 as pypdf
pdfobject=open('w10.pdf','rb')
for i in range(1,pdf.getNumPages()-1):
    print(pdf.getPage(i).extractText())

My Result city:Mohali

population 2022

st:456

ID No. 8511

H Code IPRTAHNRE

I Code #5477/7985

Name Sanjeev

Father's Name Ashwani

House No. 121

Age 22

Sex Male

ID No. 2500

H Code IPWERNER

I Code *2/5464

Name Asdff

Father's Name sgdgd

House No. 154

Age 23

Sex Male

ID No. 4564

H Code IPRAFHNR

I Code #4577/789

Name Avadhesh

Father's Name Ajor NAth

House No. 12

Age 24

Sex Male

Expected Result

This image is the expected result

  • What's the error? What's the actual vs expected result? And what's the question? – Jeremy Thompson Jul 06 '22 at 06:33
  • As a starter: https://stackoverflow.com/questions/72618062/pypdf-does-not-read-the-pdf-text-line-by-line. .. basically it’s up to you to further process and organize the text you got from the pdf. – Anonymous Jul 06 '22 at 09:36
  • 3
    Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Jul 06 '22 at 11:29
  • Please focus on one problem: is it the data extraction from pdf? Is it writing to excel? – Martin Thoma Jul 06 '22 at 21:15
  • Also, the code you've shared is pretty broken. It defines pdfobject and uses pdf – Martin Thoma Jul 06 '22 at 21:16
  • @MartinThoma we have to extract the data from the pdf and write it to excel – Avadhesh Kumar Jul 08 '22 at 05:17
  • I understand that. Please focus on one problem. The fact that you need to use "and" is a clear sign that there are two problems in one question. – Martin Thoma Jul 08 '22 at 07:08

0 Answers0