Questions tagged [pdfminersix]
5 questions
1
vote
1 answer
Reading pdf in fully asynchronous mode in python
I'm really struggling to read my pdf files asynchronously. I tried using aiofiles which is open-source on GitHub. I want to extract the text from pdfs. I want to do it with pdfminer because pypdf is not rendering math (greek letters) or double…

Quentin
- 45
- 1
- 7
0
votes
0 answers
pdfminer laparams not causing multiple LTChar to group into LTTextLine
I'm using pdfminer.six
According to this on page 8 I should be able to modify char_margin and line_overlap in a LAParams object in order to cause a bunch of LTChar objects next to each other to group into LTTextLine objects. Unfortunately, it…

Peyton Hanel
- 374
- 1
- 3
- 13
0
votes
1 answer
Filter pdf text by font wih pdfminer
So I am using pdfminer.six to extract text by a specific font. But currently I have this following problem:
from pdfminer.high_level import extract_pages
from pdfminer.layout import LTTextContainer, LTChar
def extract_text_by_font(pdf_file):
…

Thanh Long Phan
- 1
- 1
0
votes
1 answer
pdfminer mixes order of lines
I'm extracting pdf using pdfminersix.
I have following text:
enter image description here
after parsing it my result is as below:
Nr 48. Promująco na rozwój chorób alergicznych i wystąpienie objawów alergii
działa zwiększenie aktywności/ilości:…

mik.ro
- 4,381
- 2
- 18
- 23