I wants pdfplumber to extract the text from a random pdf given by the user. The problem is that pdfplumber also extracts the header text or the title from each pages. How can I program pdfplumber to not read the page headers(titles) and the page numbers(or the footer, if possible) ?
Here is code :
import pdfplumber
all_text = ""
pdf = pdfplumber.open(file)
for pdf_page in pdf.pages:
one = pdf_page.extract_text()
all_text = all_text + '\n' + str(one)
print(all_text)
where file
is the PDF Document...