Plumb a PDF for detailed information about each text character, rectangle, and line. Plus: Table extraction and visual debugging.
Questions tagged [pdfplumber]
95 questions
1
vote
1 answer
How to find table grid lines in PDF files?
To more accurately extract table-like data embedded within table cells, I would like to be able to identify table cell boundaries in PDFs like this:
I have tried extracting such tables using Camelot, pdfplumber, and PyMuPDF, with varying degrees of…

Mark Turner
- 81
- 2
- 5
0
votes
0 answers
Python Data Frame
I have an Excel file and that contains an invoice number and I have a pdf file. If the invoice number in the pdf matches the number in the Excel file, I want to write the number in the B column. In the Excel file the invoice number is written in the…
0
votes
1 answer
Remove the garbage words from the pdf
I am extracting the pdf to text using python and libraries like, fitz, pdfreader and so on. But in my pdf, there are some schematics and words I do not need on it.
Here is an example.
When extracting the text, the words of the schematics are also…
0
votes
0 answers
Correct Positioning of Stamp PDF Over Words in Original PDF using pypdf and ReportLab
I'm currently working on a Python project that involves processing PDF files. My objective is to identify specific words from a predefined list within a text-based PDF and overlay a stamp PDF onto the original PDF to highlight these words. I've…

Nicklas Healy
- 1
- 1
0
votes
0 answers
Use pdfplumber to do PDF to Excel on web page
I have a question.I want use python module pdfplumber to do pdf read,and transfer file to excel,and save it.But this program doesn't seem to be very successful.Hope someone can give me some pointers
from flask import Flask, request,…

gary
- 1
- 1
0
votes
0 answers
How to turn a text into a column - Pandas
I am new to pandas.
I have a PDF data table which I read and extract the data from, converting it into an Excel file.
This is a short, fictitious example of the table I am using, however the structure of the PDF is exactly the same:
table.pdf
After…

Ana Fortes
- 9
- 3
0
votes
1 answer
How make pdfplumber treat right vertical edge of a page as a table vertical line?
How make pdfplumber treat right vertical edge of a page as a table vertical line?
I have pdf with cropped right edge, and that cut took away the rightmost vertical line of the table.

banderlog013
- 2,207
- 24
- 33
0
votes
0 answers
How to extract data from pdf files to a single csv file using python (pandas..)
so i need to build a model that extract data from pdf files(Resumes) (OCR) so i collected bunch of pdf files and i need to convert them to suitable form for OCR and i am lost. i tried converting them to a csv file with Regex but no use the csv file…

Alex
- 53
- 7
0
votes
0 answers
Table extraction Using pdfPlumber
Traceback (most recent call last):
File "/Users/noelsjacob/Desktop/Projects/py_pdf_stm-master/TableExtractor.py", line 734, in
tables = pdf_interpreter.parse_page(1)
File…

Noel S Jacob
- 76
- 6
0
votes
1 answer
How to avoid duplication in Python PDF parsing code for mismatching table structures?
I have over 100 PDFs that are match reports from which I want to scrape data in order to store it in dataframes so I can work with it afterwards.
Problem is: Those PDFs don't always have the same structure and the reading from pdfplumber gives me…
0
votes
1 answer
Python: Parse through extracted lines
So i am trying to work on scrapping using PDFplumber and want to extract the text from this PDF and covert it into an excel (with each value - like the Expense apart from the numbers- in its own cell).
I started a bit of the code and was…

megarocker241
- 51
- 6
0
votes
0 answers
How to remove borders from a PDF using Python and pdfplumber for Azure Form Recognizer?
I am currently working on a project that involves extracting information from PDF files using Azure Form Recognizer. While I have successfully extracted the text, I am facing an issue with extracting tables. The problem arises because the entire…

Hammad Asif
- 33
- 7
0
votes
0 answers
How to extract the background color of a cell in a table in a pdf using python
I am trying to get the background colors of the cells in the table i am using pdfplumber and it is returns only empty string.
pdfplumber output
Table image
Is there any way to exract backgroung colors.
I have tried using pdfplumber and tabula and…
0
votes
0 answers
PHP invoke a batch script which calls conda python script only return null string
I wrote a python script (t.py) to extract text from pdf, and it works as I expected.
Then I wrote a PHP (t.php) script to invoke the python script (t.py) by a batch script (t.cmd).
And I found python script (t.py) import pdfplumber will disable…

somggx
- 43
- 6
0
votes
0 answers
Problem loading text from searchable pdfs ("PSKeyword" error)
I have a problem with extracting text using pdfplumber. The pdf is of type searchable and other examples work fine. On the other hand, there is one invoice, it cannot be loaded correctly. I get this error:
cannot convert 'PSKeyword' object to…

Ryotaro
- 25
- 4