option 1:
(thanks to @KJ comment) I ended up using some bulk estimations to understand if the page contains a graph or not.
If there're more than MIN_RECTS in a page I assume there's a graph there (with columns that precived as rectengels) or if there's more than MIN_CURVES than there's a graph (for me it was 0, but it depends if you have some non-trivial shapes in the header or footer). It's not the best but it works most of the time.
example for some code - using both functions and extract_text() afterwards leads to pretty good results for me.
page = pdfplumber.open("file.pdf").pages[0]
def contains_graphs(page):
return len(page.rects) > MIN_RECTS or len(page.curves) > MIN_CURVES
def only_chars_from_page_filter(page):
return page.filter(lambda obj: obj["object_type"] == "char")
option 2:
following @G5W's comment, it is possible to convert PDF to MS Word file with pywin32 to read the PDF into Word, then use extract text only with python-docx for example.