Highest Voted 'pdfplumber' Questions

0

votes

0 answers

Pdfplumber cannot recognize table

image.reset().debug_tablefinder() result how to convert it into tables that can be recognized by pdfplumber?

python dataframe pdfplumber

asked Sep 02 '21 at 06:08

Anita

1

0

votes

1 answer

How to convert every page of pdf to a pdf object using python

I want to create each page of a pdf file to a new pdf object. I am following the mentioned code snippet https://stackoverflow.com/a/490203/13291630 but here it is shown as the creation of a new file, but I want to just create a pdf object without…

pdf-generation pypdf pdfminer pdfplumber

asked Aug 30 '21 at 09:15

Mr Anonymous

75
10

0

votes

1 answer

get the table by passing table header in pdf using python

I have a pdf with multiple tables in it. I need to pass table header and get the respected table For example: I will pass the Table name as "daily historical stock prices & volumes", then it must give above table.

python pdf nlp pypdf pdfplumber

asked Aug 24 '21 at 11:52

End user

77
3

0

votes

2 answers

How to extract table details into rows and columns using pdfplumber

I am using pdfplumber to extract tables from pdf. But the table in use does not have visible vertical lines separating content so the the data extracted are into 3 rows and one huge column. I would like the above table to come into 13 rows. import…

python pandas dataframe pdfplumber

asked Aug 21 '21 at 11:55

walter_anderson

19
1
8

0

votes

0 answers

Why does pdfplumber yield no data?

I usually use pdfplumber to scrape data and text from pdfs, and 99.99% of the time, everything is fine. Though today, I have encountered a case where i can open the pdf file (using pdfplumber.open), but not extract any text / word / table. I know…

python screen-scraping pdfplumber

asked Jul 19 '21 at 15:20

Odhian

351
5
14

0

votes

1 answer

How to print the next line in Python with text extracted using pdfplumber

How can I print the next line from the text that I extracted from a PDF using pdfPlumber extract.text function? I have tried line.next() but it does not work. The actual job name is on the line after the "Job Name". As per example below. Job…

python pdfplumber

asked Jul 17 '21 at 09:55

Autom8

385
2
3
10

0

votes

0 answers

Encoding issues during the extraction text from pdf file using pdfplumber

I would like to extract the content of the following pdf file but it returns a meaningless result. I assume that it might be related to the encoding side of the file but the same extraction code works for many other files on the same infrastructure.…

python python-3.x pdf data-extraction pdfplumber

asked Jun 24 '21 at 12:41

fillo

365
1
12

0

votes

1 answer

List Index Out of Range when using PDF Plumber

Hello I am extracting text from PDF using pdf plumber and writing it to a text file but I am getting index out of range error. import glob import pdfplumber for filename in glob.glob('*.pdf'): pdf = pdfplumber.open(filename) OutputFile =…

python pdfplumber

asked Jun 23 '21 at 08:07

Haris Trading

41
7

0

votes

1 answer

Extract text from pdf file using pdfplumber

I want to extract text from a pdf file, tried: directory = r'C:\Users\foo\folder' for x in os.listdir(directory): print(x) x = x.replace('.pdf','') filename = os.fsdecode(x) print(x) if filename.endswith('.pdf'): with…

python pdf pdfplumber

asked Jun 22 '21 at 01:58

nilsinelabore

4,143
17
65
122

0

votes

2 answers

How to go about isolating dollar amounts using Regex?

I used the PDFPlumber library to extract all the lines in my PDF, a sample line extract looks like this: Total Return Transportation $16.01 The goal is to put all of these into a data frame. How do I use regex to group this line so that I may…

python regex parsing pdf pdfplumber

asked Jun 05 '21 at 17:45

pvell

1

0

votes

1 answer

How to optimize (also RAM wise) code that is saving words from PDF to Python object and later into database?

I am looking for the most efficient way of saving text from PDF files into my database. Currently I am using pdfplumber with standard code looking like this: my_string = '' with pdfplumber.open(text_file_path) as pdf: for page in pdf.pages: …

python pdfminer pdfplumber

asked May 06 '21 at 16:28

Peksio

525
6
25

0

votes

1 answer

Converting pytesseract.Output.DATAFRAME into bytes or ocr'ed pdf

Is it possible to write to a pdf file retroactively using pytesseract.image_to_data() output? For my OCR pipeline, I needed granular access to my pdf's ocr'ed data. I requested that using this method: ocr_dataframe = pytesseract.image_to_data( …

python pdf python-tesseract pdfplumber

asked May 04 '21 at 13:58

abrezey

135
9

0

votes

1 answer

How to ignore table and its content while extracting text from pdf

So far I am successful extracting the text content from a pdf file. I am stuck to a point where i have to extract text content outside of the table (ignore table and its content) and need help The Pdf can be downloaded from here import…

python pdf pdfplumber

asked May 04 '21 at 07:29

go sgenq

313
3
13

0

votes

1 answer

PDFPlumber returning symbols and inaccurate text

I'm trying to extract text from a pdf file using PDFplumber import pdfplumber pdf = pdfplumber.open(r"https://www.lupin.com/pdf/financials/subsidiaries/multicare-pharmaceuticals-philippines-inc-philippines-2018.pdf") for ps in pdf.pages: …

python-3.x pdf pdfplumber

asked May 02 '21 at 05:09

Nikhil T

1
1

0

votes

0 answers

I am having issues extracting hindi text from pdf in python

I am using pdfplumber in python.. It is not extracting hindi text well. It is showing wrong results. input :माँ, मैं रात का खाना ले आऊँगा। output: म ,ाँ म ैं र त क ख न ले आऊाँग । I want the exact output.. Any solution ??

python pdfplumber

asked Apr 01 '21 at 15:31

Hritwik Jha

1

Questions tagged [pdfplumber]