While parsing a pdf file using pypdf2, it reads the hifenated words like mm-dd-yy in a newline as :
mm
-
dd
-
yy
This is my code:
import PyPDF2
def getPDFContent(path):
pdf = PyPDF2.PdfFileReader(file(path, "rb"))
content = ""
content += pdf.getPage(0).extractText() + "\n"
return content
How can I overcome this and print them in the same line ?