Nodejs Serverless API is Corrupting PDF

Question

I have a Nodejs API that returns a PDF file in the response. It returns the PDF as binary data. On my frontend, I use a package called React-PDF which takes the PDF endpoint url and fetches the data to display. This worked completely fine up until the point where I decided to switch to the serverless framework (including serverless-offline).

Now after switching to serverless, when the PDF is fetched by my frontend, it fails to load and logs a warning to the console Warning: Indexing all PDF objects. I have used the tool vbindiff to compare the same PDF file downloaded from the normal Nodejs Express API with the one downloaded from the API using serverless-framework and I saw that both files are the same length, but the one downloaded from the serverless API has different contents in some places.

I have found a few similar questions that talk about problems with serverless and binary data, to which I added their recommended configuration to the serverless.yml file that is supposed to allow binary data, but this changed nothing to the result.

Here is the configuration that was recommended to be added to allow binary data through the API gateway

provider:
    apiGateway:
        binaryMediaTypes:
          - "*/*"

Here is my full serverless.yml file

service: flightchecklists-backend
frameworkVersion: "3"
provider:
  name: aws
  runtime: nodejs12.x
  memorySize: 1024
  timeout: 30
  region: ${opt:region}

  # To allow binary pdf responses.
  apiGateway:
    binaryMediaTypes:
      - "*/*"
functions:
  handler:
    handler: server.handler
    events:
      - http:
          path: /
          method: ANY
          cors: true
      - http:
          path: /{proxy+}
          method: ANY
          cors: true
plugins:
  - serverless-offline
custom:
  serverless-offline:
    noPrependStageInUrl: true
    httpPort: 3000

Here is my server.js file

"use strict";

require("dotenv").config();
require("log-aws-lambda")();
const express = require("express");
const serverless = require("serverless-http");
const bodyParser = require("body-parser");
const path = require("path");
const cors = require("cors");
const app = express();

const checklists = require("./routes/checklists");
const checkout = require("./routes/checkout");
const customerPortal = require("./routes/customerPortal");
const paymentUsers = require("./routes/paymentUsers");
const webhook = require("./routes/webhook");
const subscriptions = require("./routes/subscriptions");
const orders = require("./routes/orders");
const isAdmin = require("./routes/isAdmin");
const blockExternalRequests = require("./middleware/blockExternalRequests");
const namedLogger = require("./utils/logger");

const logger = new namedLogger("server.js");
const port = 3000;

// Cors
app.use(cors());

// Webhook route (Needs to be before bodyparser)
app.use("/webhook", webhook);

app.use(bodyParser.urlencoded({ extended: false }));
app.use(bodyParser.json());

// View engine
app.set("views", path.join(__dirname, "views"));
app.set("view engine", "pug");
app.use(
  "/static",
  blockExternalRequests,
  express.static(path.join(__dirname, "public"))
);

// Routes
app.use("/checklists", checklists);
app.use("/create-checkout-session", checkout);
app.use("/create-portal-session", customerPortal);
app.use("/paymentusers", paymentUsers);
app.use("/subscriptions", subscriptions);
app.use("/orders", orders);
app.use("/isadmin", isAdmin);

// Comment when using serverless.
app.listen(port, () => {
  logger.info(`Server started on port ${port}...`);
});

// Export for tests
// module.exports = app;

// Uncomment for serverless.
// module.exports.handler = serverless(app);

I am not sure what to do at this point. It is clear that my API is working fine without serverless, but I would like to use the serverless framework. I am wondering if maybe this has something to do with the serverless-offline package I am using, but I could not find any documentation about binary data for that package. I would appreciate any help I can get. If there is other information I have failed to provide, let me know and I can post it. Thank you.

EDIT: Upon request I have uploaded both PDFs, the first being the one which was downloaded correctly through the Nodejs API, and the second being the one downloaded through serverless framework and resulting in the file being corrupted.

The PDFs can be downloaded from the following page. Hopefully this is allowed.

https://ufile.io/f/t3ga1

Traditionally, the answer would be that the file had been transferred in text mode rather than binary, changing the line endings. But you mention the file length has not changed, so it can't be that. What are the exact differences - can you give an example? In particular, is the end of the file the same? Or can you supply both PDFs — johnwhitington, Dec 18 '22 at 21:37
Interestingly enough, I tried base64 encoding the responses, and the beginning and ends did seem to be the same. I can supply both PDFs if you would like. What would be the best way to do that? Just upload the files? — dividebyzero, Dec 19 '22 at 01:55
@johnwhitington I have uploaded both PDFs to a file sharing service up above. Let me know what you think. Thanks. — dividebyzero, Dec 19 '22 at 04:23
There are two things wrong with `corrupted_file.pdf`. 1) it is truncated, and 2) all the stream data is corrupt. My guess is that at some point, it has been run through some sort of automatic attempt at unicode conversion, for example to UTF8, the whole file being stored in a unicode string. You can't do that - a PDF file is binary data. The file is then truncated, because it is assumed to be the correct size (before the corruption which expands its size). How you managed to get into this situation is outside my area of expertise... The file cannot be fixed - you must fix the generation of it. — johnwhitington, Dec 19 '22 at 16:15
@johnwhitington I think you are right about the UTF-8 thing. I have been reading up on serverless framework a bit more and others have encountered similar things as well. I could not find anything concrete on how to fix it however. — dividebyzero, Dec 21 '22 at 03:07

score 0 · Answer 1 · answered Jul 24 '23 at 15:59

Sadly I haven't come up with a solution, only that I've duplicated the same thing as you on binary data corruption using serverless (both offline as well as from lambda). Run the application directly using node a binary data block is returned correctly, if I send it back with serverless the block is corrupted. Like you I have a similar block in my provider block in serverless.yml:

  apiGateway:
    binaryMediaTypes:
      - 'application/octet-stream'

Nodejs Serverless API is Corrupting PDF

1 Answers1