-1
# importing required modules 
import PyPDF2 

# creating a pdf file object 
pdfFileObj = open(path, 'rb') 

# creating a pdf reader object 
pdfReader = PyPDF2.PdfFileReader(pdfFileObj) 

# printing number of pages in pdf file 
print(pdfReader.numPages) 

# creating a page object 
pageObj = pdfReader.getPage(0) 

# extracting text from page 
print(pageObj.extractText()) 
  
df = pd.DataFrame(pdfFileObj)
print (df)
df.to_csv('output.csv')

I have converted a pdf file to csv using anaconda python 3. But the converted csv file is not in a readable form. how to make that csv in readable format?

It_is_Chris
  • 13,504
  • 2
  • 23
  • 41
Jawahar
  • 1
  • 3
  • _But the converted csv file is not in a readable form. _ What does that mean, specifically? Please provide a [mcve], as well as the current and expected output. – AMC Oct 30 '20 at 23:52

1 Answers1

0

I tested your method and I couldn't find a way to correct the csv ouput. I useally do it this way:

import csv
import os
from miner_text_generator import extract_text_by_page
def export_as_csv(pdf_path, csv_path):
    filename = os.path.splitext(os.path.basename(pdf_path))[0]
    
    counter = 1
    with open(csv_path, 'w') as csv_file:
        writer = csv.writer(csv_file)
        for page in extract_text_by_page(pdf_path):
            text = page[0:100]
            words = text.split()
            writer.writerow(words)
            
        
if __name__ == '__main__':
    pdf_path = '<your path to the file>.pdf'
    csv_path = '<path to the output>.csv'
    export_as_csv(pdf_path, csv_path)