Code
import pdfplumber
ecdata = ""
with pdfplumber.open("XYZ Transcript.pdf") as pdf:
for i in range(len(pdf.pages)):
print("Page No.: ", i+1)
page_obj = pdf.pages[i]
page = page_obj.within_bbox((70, 50, page_obj.width, 790))
ecpagedata = page.extract_text()
ecdata += ecpagedata
print(page.extract_text())
The output required should only contain complete sentences of the file and not the unwanted bullets, headings and subheadings
Good day, and thank you for standing by. Welcome to the XYZ Second Quarter 2099 Earnings Conference Call. At this time, all participants are in a listen-only mode. After the speakers' presentation, there will be a question-and-answer session. (Operator Instructions) Please be advised that today's conference is being recorded.
I would now like to hand the conference over to your speaker today, Alpha, Vice President of Investor Relations. Please go ahead.
Thank you, operator. Good afternoon and welcome to XYZ’s second quarter 2022 earnings call. I'm joined today by Bravo, XYZ’s Founder and CEO; and Charlie, our CFO. Full details of our results and additional management commentary are available in our shareholder letter, which can be found on our Investor Relations website at website.com/investor. Our comments and responses to your questions on this call reflect management's views as of today only and we disclaim any obligation to update this information. On this call, we'll make forward-looking statements which are predictions, projections, or other.
I am attaching the image of the PDF file here
The source image file is my own creation and does not directly or indirectly represent any entity real or fictitious whatsoever.