Highest Voted 'pdfplumber' Questions

1

vote

1 answer

How to find table grid lines in PDF files?

To more accurately extract table-like data embedded within table cells, I would like to be able to identify table cell boundaries in PDFs like this: I have tried extracting such tables using Camelot, pdfplumber, and PyMuPDF, with varying degrees of…

asked Mar 03 '21 at 19:26

Mark Turner

81
2
5

0

votes

0 answers

Python Data Frame

I have an Excel file and that contains an invoice number and I have a pdf file. If the invoice number in the pdf matches the number in the Excel file, I want to write the number in the B column. In the Excel file the invoice number is written in the…

python pandas dataframe pdfplumber

asked Sep 01 '23 at 12:06

Giorgi Gurabanidze

1

0

votes

1 answer

Remove the garbage words from the pdf

I am extracting the pdf to text using python and libraries like, fitz, pdfreader and so on. But in my pdf, there are some schematics and words I do not need on it. Here is an example. When extracting the text, the words of the schematics are also…

python pdf pdf-reader pymupdf pdfplumber

asked Aug 30 '23 at 10:22

Muhammad Samadzade

3
3

0

votes

0 answers

Correct Positioning of Stamp PDF Over Words in Original PDF using pypdf and ReportLab

I'm currently working on a Python project that involves processing PDF files. My objective is to identify specific words from a predefined list within a text-based PDF and overlay a stamp PDF onto the original PDF to highlight these words. I've…

python pdf pypdf pdfplumber

asked Aug 14 '23 at 20:49

Nicklas Healy

1
1

0

votes

0 answers

Use pdfplumber to do PDF to Excel on web page

I have a question.I want use python module pdfplumber to do pdf read,and transfer file to excel，and save it.But this program doesn't seem to be very successful.Hope someone can give me some pointers from flask import Flask, request,…

html python-3.x pdf pdfplumber

asked Aug 08 '23 at 07:23

gary

1
1

0

votes

0 answers

How to turn a text into a column - Pandas

I am new to pandas. I have a PDF data table which I read and extract the data from, converting it into an Excel file. This is a short, fictitious example of the table I am using, however the structure of the PDF is exactly the same: table.pdf After…

python pandas excel pdf pdfplumber

asked Jul 14 '23 at 20:12

Ana Fortes

9
3

0

votes

1 answer

How make pdfplumber treat right vertical edge of a page as a table vertical line?

How make pdfplumber treat right vertical edge of a page as a table vertical line? I have pdf with cropped right edge, and that cut took away the rightmost vertical line of the table.

python pdfplumber

asked Jul 11 '23 at 22:48

banderlog013

2,207
24
33

0

votes

0 answers

How to extract data from pdf files to a single csv file using python (pandas..)

so i need to build a model that extract data from pdf files(Resumes) (OCR) so i collected bunch of pdf files and i need to convert them to suitable form for OCR and i am lost. i tried converting them to a csv file with Regex but no use the csv file…

python pandas csv pdfplumber

asked Jul 05 '23 at 19:37

Alex

53
7

0

votes

0 answers

Table extraction Using pdfPlumber

Traceback (most recent call last): File "/Users/noelsjacob/Desktop/Projects/py_pdf_stm-master/TableExtractor.py", line 734, in tables = pdf_interpreter.parse_page(1) File…

python python-3.x pdfplumber

asked Jun 13 '23 at 12:36

Noel S Jacob

76
6

0

votes

1 answer

How to avoid duplication in Python PDF parsing code for mismatching table structures?

I have over 100 PDFs that are match reports from which I want to scrape data in order to store it in dataframes so I can work with it afterwards. Problem is: Those PDFs don't always have the same structure and the reading from pdfplumber gives me…

python for-loop dry pdf-parsing pdfplumber

asked May 25 '23 at 21:01

Pablo Martín Calvo

1
3

0

votes

1 answer

Python: Parse through extracted lines

So i am trying to work on scrapping using PDFplumber and want to extract the text from this PDF and covert it into an excel (with each value - like the Expense apart from the numbers- in its own cell). I started a bit of the code and was…

python arrays list dictionary pdfplumber

asked May 23 '23 at 21:25

megarocker241

51
6

0

votes

0 answers

How to remove borders from a PDF using Python and pdfplumber for Azure Form Recognizer?

I am currently working on a project that involves extracting information from PDF files using Azure Form Recognizer. While I have successfully extracted the text, I am facing an issue with extracting tables. The problem arises because the entire…

python-3.x pdf pdf-generation azure-form-recognizer pdfplumber

asked May 19 '23 at 20:48

Hammad Asif

33
7

0

votes

0 answers

How to extract the background color of a cell in a table in a pdf using python

I am trying to get the background colors of the cells in the table i am using pdfplumber and it is returns only empty string. pdfplumber output Table image Is there any way to exract backgroung colors. I have tried using pdfplumber and tabula and…

python-3.x pypdf pdfplumber

asked Mar 27 '23 at 09:34

Mohit Chaniyal

1

0

votes

0 answers

PHP invoke a batch script which calls conda python script only return null string

I wrote a python script (t.py) to extract text from pdf, and it works as I expected. Then I wrote a PHP (t.php) script to invoke the python script (t.py) by a batch script (t.cmd). And I found python script (t.py) import pdfplumber will disable…

php conda pdfplumber

asked Mar 26 '23 at 13:51

somggx

43
6

0

votes

0 answers

Problem loading text from searchable pdfs ("PSKeyword" error)

I have a problem with extracting text using pdfplumber. The pdf is of type searchable and other examples work fine. On the other hand, there is one invoice, it cannot be loaded correctly. I get this error: cannot convert 'PSKeyword' object to…

python pdf pdfplumber

asked Feb 17 '23 at 11:58

Ryotaro

25
4

Questions tagged [pdfplumber]