express.js and request.js - incomplete PDF transfer when using callback syntax

Question

Simplified question

why, when using express.js & request.js following two examples:

request.get(url)
.on('response' (requestjsResponse) => {
  requestjsResponse.pipe(res);
})

and

request.get(url, (err, requestjsResponse, requestjsBody) => {
  res.send(requestjsResponse)
})

Tends not to produce same results, even when requestjsBody contain expected content?

Detailed question

I have two express.js versions of route handler that are handling some file proxying procedures for multiple file types. The code is using standard express.js req/res/next notation. Basically, what might be important from the background, non-code information for this issue is that two most mainly returned types are handled as follows:

PDF: shall be opened within browser, their size is usually no less than 18K (accordinng to content-length header)
EML: Shall be downloaded, therir size is usually smaller than 16K (accordinng to content-length header)

Both handlers versions are using request.js, one with

get(url: string, callback: (Error, Response, Body) => void)

form, that I'll be referring as callback form, where entire body is expected inside such callback.In this case, the response to user is send by plain express.js res.send(Body). other one is using form

get(url: string).on(event: 'response', callback: listener: (request.Response) => void)

that I'll be referring as event/pipe form, and is transferring response to end user by piping it by request.Response.pipe(res) inside 'response' handler. Details provided in code listing.

I'm unable to find the difference between those two forms, but: In case of .eml (MIME message/rfc822, you can threat them as fancy HTML) files both versions works exactly same way, file is nicely downloaded.

In case of .pdf, when using event/pipe form get(url).on('response', callback) I'm able to successfully transfer PDF document to client. When I'm using callback form (i.e. get(url: string, callback: (Error, Response, Body) => void)), even when I'm peeking body in debugger (seems to be complete PDF, contains PDF header, EOF marker, e.c.t.), client receives only some strange preamble declaring HTML:

<!doctype html><html><body style='height: 100%; width: 100%; overflow: hidden; margin:0px; background-color: rgb(82, 86, 89);'><embed style='position:absolute; left: 0; top: 0;'width='100%' height='100%' src='about:blank' type='application/pdf' internalid='FD93AFE96F19F67BE0799686C52D978F'></embed></body></html>

but no PDF document is received afterwards. Chrome claims, that he was unable to load the document.

Please see code:

Non-working callback version:

request.get(url, (err, documentResponse, documentBody) => {
    if (err) {
        logger.error('Document Fetch error:');
        logger.error(err);
    } else {
        const documentResponseContentLength = Number.parseInt(documentResponse.headers['content-length'], 10);
        if (documentResponseContentLength === 0 || Number.isNaN(documentResponseContentLength)) {
            logger.warn('No content provided for requested document or length header malformed');
            res.redirect(get404Navigation());
        }
        if (mimetype === 'application/pdf') {
            logger.info('   overwriting Headers (PDF)');
            res.set('content-type', 'application/pdf');
            // eslint-disable-next-line max-len, prefer-template
            res.set('content-disposition', 'inline; filename="someName.pdf"');
            logger.info('Document Download Headers (overridden):', res.headers);
        }
        if (mimetype === 'message/rfc822') {
            logger.info('   overwriting Headers (message/rfc822)');
            res.set('content-type', 'message/rfc822');
            // eslint-disable-next-line max-len, prefer-template
            res.set('content-disposition', 'attachment; filename="someName.eml"');
            logger.info('Document Download Headers (overridden):', res.headers);
        }
        res.send(documentBody) /* Sending message to clinet */
    }
})
.on('data', (d) => {
  console.log('We are debugging here')
})

Working event based/piped version:

const r = request
    .get(url)
    .on('response', (documentsResponse) => {
        if (Number.parseInt(documentsResponse.headers['content-length'], 10) !== 0) {
            // Überschreibe headers für PDF und TIFF, diese kommen gelegentlich unvollständig an
            if (mimetype === 'application/pdf') {
                logger.info('   overwriting Headers (PDF)');
                res.set('content-type', 'application/pdf');
                res.set('content-disposition', 'inline; filename="someName".pdf"')
                logger.info('Document Download Headers (overridden):', documentsResponse.headers);
            }
            if (mimetype === 'message/rfc822') {
                logger.info('   overwriting Headers (message/rfc822)');
                res.set('content-type', 'message/rfc822');
                res.set('content-disposition', 'attachment; filename="someName".eml"');
                logger.info('Document Download Headers (overridden):', res.headers);
            }
            r.pipe(res); /* Response is piped to client */
        } else {
            res.redirect(get404Navigation());
        }
    }
   .on('data', (d) => {
     console.log('We are debugging here')
   })

Event that part with r.pipe(res) seems extra suspicious (see where r is declared and where is used) this is the versions that works correctly for both cases.

I assume, that issue might be caused by nature of sending multipart content, so I added additional on('data', (d)=>{}) callbacks and set breakepoints to see, when response is ended/piped vs when data handler is called, and results are according to my expectations:

request(url, (err, response, body)) case, data handler is called twice, before execution of callback, entire body is accessible inside handler, so It's even more obscure to me that I'm unable just to res.send it. request.get(url).on('response') piping to res is called firstly, then two times data handler is called. I believe internal guts of node.js HTTP engine are doing the asynchronous trick and are pushing responses one after another at each response chunk is received.

I'll be glad for any explanation, what I'm doing wrong and what can I align to make my callback version work as expected for PDF case.

Epilogue: Why such code is used? Our backend is retrieving PDF data from external, non-exposed to public internet server, but due to legacy reasons some headers are set incorrectly (mainly Content-Disposition), so we are intercepting them and act as kind of alignment proxy between data source and client.

express.js and request.js - incomplete PDF transfer when using callback syntax

Simplified question

Detailed question

0 Answers0