Questions tagged [pdfparser]

a standalone PHP library, provides various tools to extract data from a PDF file

See https://github.com/smalot/pdfparser

39 questions
0
votes
1 answer

Tests randomly return bad XRef Entry after readFileSync

This is probably too specific but I can't find what is wrong with this. I'm using cypress test tool and I need to verify the contents of a PDF. For this I've created a task: const pdf = require('pdf-parse'); getPdfContent(pdfName) { return…
Matias Diez
  • 1,237
  • 2
  • 17
  • 26
0
votes
1 answer

Cypress pdf-parse throws error Fs.readFileSync is not a function

I have been trying to use pdf-parse plugin on cypress to validate the context of some pdfs but I get the error "Fs.readFileSync is not a function". I am on version 12.4.1 but I did try other cypress versions with the same results (6.0.0, 7.5.0,…
0
votes
1 answer

Php Pdf Parser read content showing as a two lines. need to fix it

I used pdfparserto read PDF content. but one address line showing as a two line. in that time it is showing as a two new lines. i want to get that full address as a one line. pdf files are dynamic. according to the address length it is showing as a…
0
votes
0 answers

PDF reader for Java as PDF.js

We have a project where we use pdf.js to render a PDF into webpage and it creates HTML container elements for the PDF pages. The content of the PDF is split as HTML span in the view. Attached is the image which shows how pdf text is rendered in the…
0
votes
1 answer

Error - when getting text from pdf file using smalot pdf parser in codeigniter-4

I'm trying to upload a pdf file. It can be password protected or not. But I receive this error: Allowed memory size of 134217728 bytes exhausted on line ***print_r($pages);*** This however only happens on PDF files that aren't password protected.…
0
votes
1 answer

read string by white spaces in php

i an trying to read a PDF with this library \Smalot\PdfParser\Parser(); in laravel 5.6 I am getting all content ok, but i have this: Array ( [0] => MARTIN CARRILLO MARIA ESMERALDA ALHAMBRA 10 958 54 38 93 [1] => ESPIGARES DIAS JOSE ANTONIO…
scorpions78
  • 553
  • 4
  • 17
0
votes
1 answer

nodejs pdf parse getting value after specific string

my goal is to get a certain string after a predefined text. In this case i would like to read the following value: I found out this is possible using regex, therefore i tried this: const fs = require("fs"); const PDFParser =…
Dominik Hartl
  • 105
  • 1
  • 2
  • 8
0
votes
0 answers

PdfParser issue in PHP

Thank you in advance I am using the PdfParser library to extract text from PDF My current code for that is as below $parser = new \Smalot\PdfParser\Parser(); $pdfsource = $parser->parseFile($dest_path); $pages = $pdfsource->getPages(); foreach…
Ronak Solanki
  • 341
  • 2
  • 5
  • 14
0
votes
0 answers

What is the best way to extract the body of an article with Python?

Summary I am building a text summarizer in Python. The kind of documents that I am mainly targeting are scholarly papers that are usually in pdf format. What I Want to Achieve I want to effectively extract the body of the paper (abstract to…
mdave1701
  • 37
  • 5
0
votes
0 answers

How to use async await for events in pdf2json(pdfParser)

I am using https://www.npmjs.com/package/pdf2json npm package which will pick the pdf from the given path and when the pdf parser is ready to parse it, then it triggers an event pdfParser_dataReady. I want to user this along with async await. const…
Rajeshwar
  • 2,290
  • 4
  • 31
  • 41
0
votes
1 answer

Fatal error: Uncaught Error: Class 'Smalot\PdfParser\Parser' not found in /var/www/html

i installed PdfParser with composer and it works when i open the page cron.php. The pdf is parsed. this is my code in cron.php: include 'vendor/autoload.php'; //include $_SERVER["DOCUMENT_ROOT"]. '/vendor/autoload.php'; //require…
emil_alm
  • 9
  • 1
  • 3
0
votes
0 answers

Unable to extract the content of pdf file in php

Currently working on validating the pdf file. I have used PHP pdfparser in Laravel to extract the file. But some files are unable to extract. I come up with the solution to downgrade the pdf file to resolve the issue but still not working for me. I…
Dhaval Mistry
  • 476
  • 1
  • 9
  • 17
0
votes
2 answers

How to get text form copy protected pdf files or having different fonts?

I am using pdfparser for copy text from PDF files but some PDF files are copy protected or have different fonts so that pdfparser not working for that, is it possible to get text from copy protected PDF? This is my Code : // Include…
V.p. Dixit
  • 19
  • 3
0
votes
1 answer

PDFplumber password and check_extractable

I am using pdfplumber library for parsing pdf. The way to access a pdf file is "pdfplumber.open(path)". Can someone please help me how to pass the password and the check_extractable parameters in this.
0
votes
0 answers

how to parse pdf in selenium

I have been trying to read a pdf which is opened in browser. through the following selenium code. URL pdfURL = new URL(driver.getCurrentUrl()); InputStream is = pdfURL.openStream(); BufferedInputStream fileToParse= new…