Questions tagged [pdfminersix]

5 questions
1
vote
1 answer

Reading pdf in fully asynchronous mode in python

I'm really struggling to read my pdf files asynchronously. I tried using aiofiles which is open-source on GitHub. I want to extract the text from pdfs. I want to do it with pdfminer because pypdf is not rendering math (greek letters) or double…
Quentin
  • 45
  • 1
  • 7
0
votes
0 answers

pdfminer laparams not causing multiple LTChar to group into LTTextLine

I'm using pdfminer.six According to this on page 8 I should be able to modify char_margin and line_overlap in a LAParams object in order to cause a bunch of LTChar objects next to each other to group into LTTextLine objects. Unfortunately, it…
Peyton Hanel
  • 374
  • 1
  • 3
  • 13
0
votes
1 answer

Filter pdf text by font wih pdfminer

So I am using pdfminer.six to extract text by a specific font. But currently I have this following problem: from pdfminer.high_level import extract_pages from pdfminer.layout import LTTextContainer, LTChar def extract_text_by_font(pdf_file): …
0
votes
1 answer

pdfminer mixes order of lines

I'm extracting pdf using pdfminersix. I have following text: enter image description here after parsing it my result is as below: Nr 48. Promująco na rozwój chorób alergicznych i wystąpienie objawów alergii działa zwiększenie aktywności/ilości:…
mik.ro
  • 4,381
  • 2
  • 18
  • 23