3

I am using the node module pdf-to-text for my Nodejs lambda function, but I was getting a "spawn pdftotext ENOENT" error. I tried launching an AWS EC2 instance and compiling poppler there using this script. I managed to get a tar.gz file on S3 which contains a popplar file and within that a bin and lib folder inside. However, when I reference that poppler file placed in a bin folder like this: process.env['PATH'] = process.env['PATH'] + ':' + path.join(process.env['LAMBDA_TASK_ROOT'], '/bin/poppler');

That did not work. I got an spawn ENOTDIR error. I then unzipped it and tried referencing the bin folder inside the file: 'bin/bin' but I got a spawn EACCES error.

I tried directly referencing the "bin/pdftotext" as well and got the spawn ENOTDIR error again.

Does anyone have any luck putting this library in a lambda function?

TL;DR, I want to run the command spawn('pdftotext') in my lambda function.

Made some progress, I did a chmod on the poppler folder and now I'm getting this error:

Error: pdf-text-extract command failed: pdftotext: error while loading shared libraries: libpoppler.so.56: cannot open shared object file: No such file or directory

2 Answers2

1

Only two directories are accessed by awsLambda user code.

  1. /var/task
  2. /tmp

all lambda functions keep the source code on /var/task

in your code, use anyone of this path instead of current directory path "/bin/poppler". like this

process.env['PATH'] = process.env['PATH'] + ':' + path.join(process.env['LAMBDA_TASK_ROOT'], '/bin/poppler');
Abdul Manaf
  • 4,933
  • 8
  • 51
  • 95
1

Run ldd pdftotext to see a list of statically linked libraries you will need to include in your lambda deployment package under a LAMBDA_TASK_ROOT/lib directory.

For example, I see this when I run it:

    linux-vdso.so.1 =>  (0x00007ffc2e998000)
    libjpeg.so.62 => /usr/lib64/libjpeg.so.62 (0x00007f6a7d732000)
    libopenjpeg.so.2 => /usr/lib64/libopenjpeg.so.2 (0x00007f6a7d512000)
    libfontconfig.so.1 => /home/ec2-user/tmp/usr/lib/libfontconfig.so.1 (0x00007f6a7d2d0000)
    libxml2.so.2 => /home/ec2-user/tmp/usr/lib/libxml2.so.2 (0x00007f6a7cf6c000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007f6a7cd67000)
    libfreetype.so.6 => /home/ec2-user/tmp/usr/lib/libfreetype.so.6 (0x00007f6a7cab4000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f6a7c898000)
    libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f6a7c592000)
    libm.so.6 => /lib64/libm.so.6 (0x00007f6a7c290000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f6a7c07a000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f6a7bcb5000)
    libz.so.1 => /lib64/libz.so.1 (0x00007f6a7ba9f000)
    /lib64/ld-linux-x86-64.so.2 (0x00005559f0a32000)

Anything that is not already included in the base lambda AMI will probably need to be copied into the LAMBDA_TASK_ROOT/lib directory in order for pdftotext to run (in my case everything point to /home/ec2-user/tmp/*). Your error specifically mentioned libpoppler.so.56 so you need to find your build and include that file in this directory.

The docs mention a LD_LIBRARY_PATH environment variable that you can adjust so you can put the files in a different place, but when I've tried setting it inside my lambda, it had no effect, so I would put the files in the directory that is already in the path.

Chris Lim
  • 11
  • 1