0

I have coordinate details, but I'm unable to find any method in pymupdf to fetch a block of data based on the coordinates. Is there any method in pymupdf that can do this? I'm open to other libraries, though I already used PDFQuery which is not working properly.

Explanation: I want to read block of text within the given coordinates using pymupdf. For example, if I have coordinates x0, y0, x1, y1 I should be able to get the text within the block the same format as in PDF.

For example: if I do

print(page.get_textbox(fitz.Rect([40.91999816894531, 274.94500732421875, 349.88214111328125, 364.9531555175781])))

It's giving me a string with each word in that block separated by a new line. Is there a way I can get the block as the same format as in PDF?

Ben the Coder
  • 539
  • 2
  • 5
  • 21
m9m9m
  • 1,655
  • 3
  • 21
  • 41
  • Are you referring to pdfminer or to pymupdf? **PyMuPDF definitely is able** to deliver all coordinates of all text - down to each single character if needed. – Jorj McKie Mar 15 '23 at 11:56
  • 1
    With PyMuPDF, you can extract text of the whole page, or from any sub-rectangle you want. All this is also documented in detail - so please be more specific, what your problem is. – Jorj McKie Mar 15 '23 at 11:58
  • I have added more details in the question @JorjMcKie – m9m9m Mar 15 '23 at 13:32
  • Is pdftotext a python library? @KJ – m9m9m Mar 15 '23 at 18:29

0 Answers0