Questions tagged [python-pdfreader]

Python API to parse PDF documents, extract texts (plain and formatted), images, XObjects, Forms and other data. Provides direct access to all object attributes and object history. Follows PDF 1.7 specification.

Python API to parse PDF documents, extract texts (plain and formatted), images, XObjects, Forms and other data.

Follows PDF 1.7 specification.

Provides direct access to all object attributes and object history.

See pdfreader - Tutorials and Examples

32 questions

votes

1 answer

is there a way to measure margins of a pdf using python?

I've been using different python packages to parse PDFs, but I'm wondering if it's possible to measure the margins of a particular line in the document. The measurement I would like is for it to be in pixels css-style, if possible. It doesn't need…

python pdf python-pdfreader

asked Jan 26 '23 at 03:23

Yehuda

votes

1 answer

How to use Python Fitz detect Hyphen when using search_for?

I'm new to the Fitz library and am working on a project where I need to find a string in a PDF page. I'm running into a case where the text on the page that I'm searching on is hyphenated. I am aware of the TEXT_DEHYPHENATE flag that I can use in…

python pymupdf python-pdfkit python-pdfreader

asked Dec 01 '22 at 20:03

Kevin Wu

votes

0 answers

Python PdfReader: Getting error when sequentially reading PDFs in a folder: Errno 2 (No such file or directory): 'filename.pdf'

I'm trying to put together a code that will procedurally read through a file of PDFs to scrape relevant information such as part names, numbers, materials, and final treatments. The (presumably) problematic part of the code is written: for fp in…

python indexing pdf-reader python-pdfreader

asked Nov 21 '22 at 16:47

Tyler

votes

0 answers

can't read pdf files by using camelot

import camelot from google.colab import files uploaded = files.upload() file = "foo.pdf" tables = camelot.read_pdf(file) print("Total tables extracted:", tables.n) tables = camelot.read_pdf(file) print("Total tables extracted:",…

python-3.x python-camelot python-pdfreader

asked Nov 11 '22 at 09:19

Vasavi Sreerama

votes

1 answer

is there a way to read the contents of a pdf or word document in python while keeping its structure (level and depth of bulleted lists)

I want to generate a html code from a pdf or word document. The document contains bulleted lists and somes bulleted lists contains and other bulleted lists. I want to transfom that bulleted lists in html but when I extract the content of the…

python python-docx python-pdfreader

asked Dec 07 '21 at 04:30

Guiffou Joel

votes

1 answer

Comparing keywords with PDF files

Here is the program that called the files through folder name and extract data. Now i want to compare the data with the keywords that I used in the program below. But it gives me: pdfReader = pdfFileObj.loadPage(0) AttributeError:…

python pdf pymupdf python-pdfreader

asked Jun 24 '21 at 07:42

Abrar Hussain

votes

1 answer

Fields "Created" and "Modified" in Document Properties (PDF) were not displayed

Currently I have merged many PDFs together to create one PDF together. I have added metadata information which includes two fields "Created" and "Modified" but as a result these fields still do not display information. Here's my source code: import…

python python-3.x pymupdf python-pdfreader

asked Feb 03 '21 at 12:17

Thuấn Đào Minh

votes

2 answers

extract text from pdf File from S3 bucket python

I have multiple format files in my AWS s3 bucket like pdf,doc,rtf,odt,png and I need to extract text from it. I have managed to get the list of contents with their path .now depending on the file type i will use different libraries to extract text…

python amazon-s3 python-pdfreader

asked Jan 19 '21 at 06:05

user14956888

votes

1 answer

can't use PyPDF2 to open my pdf file on jupyter notebook

I tried opening a pdf file which I downloaded with the PyPDF2 module already installed like this: import PyPDF2 pdfFileObj = open('ssopenpyxl-readthedocs-io-en-latest.pdf', 'rb') pdfReader = PyPDF2.PdfFileReader(pdfFileObj) pdfReader.numPages and…

python-3.x jupyter-notebook python-pdfreader

asked Oct 18 '20 at 02:39

Atimah Adavize

votes

1 answer

pdfplumber gives fp.seek(pos) AttributeError: 'dict' object has no attribute 'seek'

So this is my code: def main(): import combinedparser as cp from tkinter.filedialog import askopenfilenames files = askopenfilenames() print(files) #this gives the right files as a list of strings composed of path+filename …

python python-3.x pdf python-decorators python-pdfreader

asked Sep 22 '20 at 07:19

Zachary Thomas

votes

0 answers

How can we create a blank Pdf using pypdf2?

import PyPDF2 writer = PyPDF2.PdfFileWriter() writer.addBlankPage(219, 297) with open (r"C:\\Users\\Aditya\\.spyder-py3\\scripting in python\\sample pdf with python\\mergedpdf.pdf","wb") as file: writer.write(file) file.close() unable to…

python pdf python-3.8 pypdf python-pdfreader

asked Aug 08 '20 at 17:51

Aditya Bhatia

votes

3 answers

Django open pdf on certain page number

I am trying to create a PDF analysis web app and I am stuck. I want to allow the user to open a certain page of the pdf that have over 300 pages in it. So, can anyone tell me how to use Django to open the pdf in a new tab on a specific page? EDIT…

python django django-views django-templates python-pdfreader

asked Jul 17 '20 at 16:01

Rashbir Kohli

votes

1 answer

How to read data from bank statement PDF in python?

I have to read the data from bank statement PDF which contains text and table. I have tried some solutions provided over stack-overflow but getting errors for the most of them. From many following one code worked for me but not getting expected…

python python-pdfreader

asked Jun 29 '20 at 07:44

Pavan Deshmukh

votes

1 answer

How to store PDF in MySQL database without generating PDF file in Python

So basically I have a base64 encoded PDF data in MySQL database, And I want to manipulate that data ( Update the form fields of PDF file data), after that without creating/Write a PDF file I want to store that manipulated/updated data into a…

python python-3.x base64 pypdf python-pdfreader

asked Apr 29 '20 at 18:27

Chaitanya Bhojne

votes

1 answer

Need help in importing data from pdfplumber to .csv file

I used pdfplumber to extract text from pdfs but when I tried to import the data using to_csv throwing #me an error. Need help in importing the data to .csv import pdfplumber import pandas as pd import numpy as np import os import re from collections…

python pdf text-extraction tabula python-pdfreader

asked Mar 16 '20 at 07:31

Murthy P

Prev 1

3 Next