0

I have a Wordpress site installed on a VPS with Debian 11. One of the functionalities is reading uploaded PDF documents using the XPDF library and PHP wrapper PHP-XPDF: https://github.com/alchemy-fr/PHP-XPDF, which uses XPDFReader: https://www.xpdfreader.com/index.html

Basically I want to write the contents of the PDF to a string and then write that to an ACF custom field.

But I have a problem with the path to the PDF file. I tried via URL (https://silkstack.com/wp-content/uploads/2023/06/document.pdf) and via file path: /var/www/html/wp-content/uploads/2023/ 06/document.pdf, in both cases I get an 'is not a valid file' error.

Xpdf/pdftotext on the server is working normally If I run the command directly through the shell "pdftotext /var/www/html/wp-content/uploads/2023/06/document.pdf" a txt file with the content of the PDF is saved in the same location.

I tested with a simple PHP script and with a PDF document in the same folder (in this case only the file name without the path is specified in PHP) and in this case the script works. Example:

<?php
 
require __DIR__ . '/vendor/autoload.php';
 
$logger = null;
 
$pdfToText = XPDF\PdfToText::create(array(
    'pdftotext.binaries' => '/usr/bin/xpdf',
    'pdftotext.timeout' => 30, // timeout for the underlying process
), $logger);
 
$text = $pdfToText->getText('sample.pdf');
 
// remove non-latin characters
$clean_txt = preg_replace('/[^\00-\255]+/u', '', $text);
var_dump($clean_txt);

Any idea how I would set the file path for PHP-XPDF?


Update 17.6.2023:

I'm aware that URL path makes no sense to use, so if I use file path /var/www/... I get this error:

PHP Fatal error: Uncaught Alchemy\BinaryDriver\Exception\ExecutionFailureException: pdftotext failed to execute command '/usr/bin/xpdf' '-raw' '-nopgbrk' '-enc' 'UTF-8' '-eol' '-unix' '/var/www/html/wp-content/uploads/2023/06/document.pdf' '/tmp/xpdfWfGd3O' in /var/www/html/wp-content/themes/child_theme/vendor/alchemy/binary-driver/src/Alchemy/BinaryDriver/ProcessRunner.php:100

Could it be permissions problem? www-data has permissions 0755 on folders and 0644 on files

weezle
  • 79
  • 1
  • 6
  • 15
  • 1
    The error message itself appears to come directly from the initial `file_exists` check the method performs, https://github.com/alchemy-fr/PHP-XPDF/blob/master/src/XPDF/PdfToText.php#L132 - so using a URL makes no sense to begin with, `file_exists` does not work with those. What happens, when you check the file system path you are using in your _own_ code, using the same function? (I am assuming the file path you actually used did not have that space here, `/2023/ 06/`, and you only inserted that while posting here?) – CBroe Jun 16 '23 at 12:06
  • Domain is just an example one. Current dev site is non-public. – weezle Jun 17 '23 at 11:20
  • I've updated my question – weezle Jun 17 '23 at 11:26

0 Answers0