Pdf parsing using pypdf2

Asked Feb 15 '16 at 04:43

Active Dec 20 '22 at 17:48

Viewed 1,424 times

While parsing a pdf file using pypdf2, it reads the hifenated words like mm-dd-yy in a newline as :

This is my code:

import PyPDF2    
def getPDFContent(path):
    pdf = PyPDF2.PdfFileReader(file(path, "rb"))    
    content = ""
    content += pdf.getPage(0).extractText() + "\n"    
    return content

How can I overcome this and print them in the same line ?

edited Dec 20 '22 at 17:48

Martin Thoma

124,992
159
614
958

asked Feb 15 '16 at 04:43

sri vignes

Check this http://stackoverflow.com/questions/11017379/pypdf-ignores-newlines-in-pdf-file – Kenly Feb 15 '16 at 07:07
It was not solved yet – sri vignes Feb 16 '16 at 06:29
Can anyone help me solve this issue ? – sri vignes Feb 18 '16 at 13:38

Pdf parsing using pypdf2

0 Answers0