How can I detect different "blocks" of text extracted from a PDF to split them into paragraphs? Could I try to use to use their position to do this?
PyMuPDF only puts one newline character between the blocks, and also one newline after one of the lines, making it not possible to distinguish between a separate block and a new line.