0

I have bash script that takes a booklet format PDF and converts it to separate pages. The script is called by php running under nginx.

I am using pdfcrop, which calls pdfTex, which is the point of failure.

The script runs fine as root from the command line. However, when run by nginx (the script is called via php) it fails when pdfcrop calls pdfTex.

Here is the line for the failure point:

pdfcrop --ini --verbose --bbox "0 0 1000 600" --margins "-490 10 10 10" ${tempDir}$1 ${tempDir}right.pdf

I log the verbose output and get the following:

nginx
PDFCROP 1.40, 2020/06/06 - Copyright (c) 2002-2020 by Heiko Oberdiek, Oberdiek Package Support Group.
* PDF header: %PDF-1.5
* Running ghostscript for BoundingBox calculation ...
GPL Ghostscript 9.25 (2018-09-13)
Copyright (C) 2018 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 2.
Page 1
%%BoundingBox: 90 83 972 571
* Page 1: 0 0 1000 600
%%HiResBoundingBox: 90.674997 83.069997 971.999970 570.725983
Page 2
%%BoundingBox: 33 23 969 572
* Page 2: 0 0 1000 600
%%HiResBoundingBox: 33.731999 23.939999 968.147970 571.697983
* Running pdfTeX ...

The first line 'nginx' is because I log the result of whomami to confirm which user is running the script. Note that the script just stops at the pdfTex call.

Again, the script works as it should when run as root from the command line. It seems that pdfTex is not available to the nginx user. If that is the case, how do I fix it?

TIA

EDIT: thinking that the issue could be permissions, that pdfTex couldn't write its temp files, I changed the owner and group of the directory where my script runs to nginx. The results are the same.

EDIT 2: Here is the PHP call to my script:

chdir($scriptDir);
$result = shell_exec('./friedensBulletin.sh' . ' ' . $bulletinName . ' ' . $bulletinName);
chdir($cwd);

$scriptDir is the location of my script. $cwd is set as the current working dir, then reset here.

EDIT 3: The entire bash script

#!/bin/bash

#################################################  
# takes a PDF, crops, exports as html           #
# req. pdfcrop for cropping                     #
# req. poppler (pdftohtml) for file conversion  #
# $1 input file                                 #
# $2 output file                                #
# author: roger@rogercreasy.com                 #
# 01.09.2021                                    #
#################################################

tempDir="tmp/"

# handle pages 1 and 3
pdfcrop --ini --bbox "0 0 1000 600" --margins "-490 10 10 10"   ${tempDir}$1 ${tempDir}right.pdf
pdfseparate ${tempDir}right.pdf ${tempDir}right%d.pdf

#handle pages 2 and 4
pdfcrop --ini --bbox "0 0 1000 600" --margins "10 10 -490 10"    ${tempDir}$1 ${tempDir}left.pdf
pdfseparate ${tempDir}left.pdf ${tempDir}left%d.pdf

#recombine in the correct order
pdfunite ${tempDir}right1.pdf ${tempDir}left2.pdf ${tempDir}right2.pdf   ${tempDir}left1.pdf ${tempDir}tmp.pdf

mv ${tempDir}tmp.pdf $2

# clean up uneeded files
rm ${tempDir}*.pdf
Roger Creasy
  • 1,419
  • 2
  • 19
  • 35
  • Running this as `root` is borderline crazy. – tripleee Feb 06 '21 at 12:31
  • @tripleee For testing? Why? – Roger Creasy Feb 06 '21 at 12:32
  • I'm guessing the surrounding PHP code does something wrong. Can you check what command *exactly* it ends up executing? – tripleee Feb 06 '21 at 12:33
  • 1
    Testing as `root` removes a large number of possible failure scenarios, and obviously introduces a number of security problems. You don't want TeX to be able to wipe your hard drive, even if you trust it for the time being not to try. (And if you pass it input you don't have complete control over, you should not.) – tripleee Feb 06 '21 at 12:34
  • PHP calls my bash script. Since that bash script logs correctly, it seems like PHP is calling the correct script. RIght? Also, it runs correctly in my local podman environment. – Roger Creasy Feb 06 '21 at 12:36
  • Yeah, but if the quoting is wrong, it will be calling it with incorrect parameters. – tripleee Feb 06 '21 at 12:36
  • added a snippet of my php. btw, thanks for helping – Roger Creasy Feb 06 '21 at 12:44
  • There are pieces missing, we don't know what `$tempDir` gets expanded to, and the script as shown does not appear to accept a second command-line parameter at all. – tripleee Feb 06 '21 at 12:46
  • added the entire bash, without the html conversion bits – Roger Creasy Feb 06 '21 at 12:55
  • In which directory does this run; does that have a subdirectory named `tmp`, and does the user who runs this script have write access there? – tripleee Feb 06 '21 at 13:01
  • 1
    More generally try http://shellcheck.net/ which will at least gripe about the unquoted variables. – tripleee Feb 06 '21 at 13:01
  • the script is in a directory within the web app directory structure, /var/[dir that contains sites]/[sitename]/storage the parent dir (storage), the dir containing the script, the tmp dir (within the dir containing the script) are all owned by my user, with the group set to nginx, and permissions 770 – Roger Creasy Feb 06 '21 at 13:34

1 Answers1

0

My original theory that pdfTex was not available to the nginx user was correct.

In my script, I logged the result of which pdftex. This command returned not found. The solution was to create a symlink to the pdftex script. I did this by adding the following to my script.

if ! [[ -L "pdftex" ]]; then
    ln -s /bin/pdftex pdftex
fi

This checks if the link exists, then creates it if it does not. This approach allows my script to work if moved to another server, assuming of course that pdftex is always installed in the same location. I found the location of pdftex by running `which pdftex' as root on the command line.

Thanks to Heiko Oberdiek, the author of pdfcrop for help in solving this.

Roger Creasy
  • 1,419
  • 2
  • 19
  • 35