0

I have an export to PDF feature in an app that emails financial data to a user. I got this feature 100% working using my test site, but it did not work when I deployed to my production site. On the prod site, the user is emailed with a blank, 1kb, 1-page PDF file (this is open-able in a PDF viewer).

Here's the down-low for how I have the feature working. I used this guide to get Nightmare JS working on Amazon Linux.

  1. Server export receives an /export call
  2. A child process is called that uses xvfb-run to create a visual frame buffer for Nightmare JS to work off of.

    exec(`xvfb-run -a --server-args="-screen 0 1366x768x24" node 
         ${path.join(__dirname, 'export.js')} ${process.env.TYPE} ${token} 
         ${req.user.email}`)
    
  3. In the export.js process, Nightmare JS does an HTTP browser load off of the port that the site is running off of (test site: port 3001, prod site: 3000).

    Nightmare()
        .viewport(1366, 768)
        .goto(
            `http://localhost:${process.env.PORT}/export_summary_1?
            exportToken=${exportToken}`
         )
         .wait(3000)
         .pdf()
         .end()
         .then(function(pdfBuffer) {
              mailOptions.attachments = [{
                  filename: 'financial_model.pdf',
                  content: pdfBuffer
              }]
              email.send(mailOptions)
         })
    
  4. The /export_summary_1 browser page makes another server call to load and display the financial data using the export token. Then, as seen in the code above, Nightmare captures the page into PDF, and finally this PDF is emailed to the user.


I don't believe my AWS setup is the culprit because the concerned portion of the export to PDF system is all done locally, on the AWS instance (e.g.-the Nightmare page load is done to http://localhost:[port]/export_summary_1).

I do have an HTTP->HTTPS redirect, but I have this bypassed for the Nightmare page load and the loaded page's server call to retrieve the financial data.

Even though I don't believe the issue is my AWS setup, here's my notes on the AWS and port setup, for completeness.

  • Both ports 3000 and 3001 are enabled in my security group used by my instance.
  • Load balancer setup
    • Port 80 and 443 listeners forward to 'ec2-instance-targets', which has the AWS instance as a registered target, using port 3000.
    • Port 3001 listener forwards to 'ec2-devInstance-targets', which has the AWS instance as a registered target, using port 3001.

Also, the AWS instance is an 'unhealthy' registered target in the 'ec2-instance-targets' view('ec2-instance-targets' is used for the prod site). This is because, I believe, the 'ec2-instance-targets' target group has a port setting of 80 while the registered target has a port setting of 3000.

But, the prod site still works, and as I said before, these AWS/load balancer settings don't seem to be the issue source because the export to PDF system is contained within the AWS instance, for all intents and purposes.

Although, I do plan on trying to resolve the above aspect in a few hours, when it will be nighttime for me.


Also, I commented out the Nightmare-loaded page's call to the server for financial data. Now, the loaded page that is exported to PDF just has 'hello world' text on it. The issue persists.


How can I get the export to PDF feature working on the prod site?

RedKrovvy
  • 51
  • 6
  • Do you have a load balancer in DEV? – Kevin Brown Feb 08 '18 at 17:50
  • Yes, I do. I think it's that way because my team wanted an easy way to see the test site, like http://testsite.com:3001. – RedKrovvy Feb 08 '18 at 18:28
  • I had this issue in a solution with AWS and load balancer but it doesn't sound like your issue. Not sure. I had a PDf generating solution running in a web application and what I resolved was happening is that the PDF was generated on one machine, but the request to do something with it (in may case view it) was routed to another machine in the load balancer. – Kevin Brown Feb 09 '18 at 05:26
  • In our case, we had to grab and send the actual IP address of the machine with the PDF written because we have many servers, even using Route53 regional routing and multiple end instances under the load balancer. The request to get the PDF in no way was guaranteed to hit the same machine in which it was written. – Kevin Brown Feb 09 '18 at 05:32
  • I see. Yes, your issue is different, I think. For my setup, the PDF generation all occurs on my one EC2 instance. – RedKrovvy Feb 09 '18 at 18:11
  • Not seeing how I can edit my question, so I'll add one thing here. I changed the 'ec2-instance-targets' target group to have a port setting of 3000, so it now matches the registered target's port. – RedKrovvy Feb 10 '18 at 06:24

1 Answers1

0

tldr; NightmareJS loaded a blank page, which was exported to PDF, because there were sub-calls to https://localhost:3000/bundle.js and https://localhost:3000/styles/style.css, which are responsible for all presented content, that could not be connected (there was no HTTPS server set up as HTTPS was achieved via an AWS load balancer).


In my question, I mentioned how I thought the concerned portion of the export-to-PDF processing was all being done locally. This actually was the case, but there was further HTTP->HTTPS redirecting going on. Here is how the HTTP->HTTPS redirect code looked:

app.use(function(req, res, next) {
    if(!req.secure &&
       req.get('X-Forwarded-Proto') !== 'https' &&
       // above: achieves HTTP->HTTPS redirecting using AWS load balancer and EC2 instance
       !/export_summary/.test(req.url)) {
           res.redirect('https:' + req.hostname + req.url)
    } else {
        next()
    }
})
// NOTE: all code above is commented out on test site, and uncommented for production site

Here's a Stackoverflow page explaining how to achieve HTTP->HTTPS redirecting using an AWS load balancer and EC2 instance, and what is used in the code above.

When I initially deployed the export-to-PDF feature to the prod site, I got a NightmareJS error informing me that https://localhost:3000/export_summary could not be reached. So, I inserted the !/export_summary/.test(req.url) conditional so that NightmareJS's HTTP call to the export_summary page would not be redirected to a HTTPS call.

The above code block was also acting upon NightmareJS's sub-calls involved in loading the export_summary page. So, the two main sub-calls for resources, to http://localhost:3000/bundle.js and http://localhost:3000/styles/style.css, were redirected to their HTTPS counterparts. (Note: the bundle.js call is <script src='bundle.js'></script> in the HTML file served for /export_summary.

As there was no actual HTTPS server set up on the AWS instance, those sub-calls could not connect (a client would report this as a 'connection refused' error message). HTTPS for the app is achieved via an AWS load balancer.

Those two resources are responsible for the entire, presented content for the app. So, without them, the page that was loaded was white and blank, as the exported PDF showed.

Also, as mentioned in comments in the code block above, the HTTP->HTTPS redirect code was commented out on the test site. This explains how I was seeing the feature work as expected on the test site, since the troubling middleware was not a part of the processing.

Solution

I updated the conditional for bypassing the HTTP->HTTPS redirect to also include the concerned sub-calls:

!/(export_summary)|(exportPrivate)|(bundle\.js)|(style\.css)|(\.png)|(\.ttf)|(\.woff)/.test(req.url))

NOTE: The PNG, TTF and WOFF resource calls also needed to be bypassed, as they are needed for the complete, polished PDF.

Lastly, I do not see a security risk here, although I imagine of course there are security improvements that can be made.

  • No confidential data is kept in the bundle.js, style.css and image/font files.
  • For an /export_summary call, the server responds with an index.html file that is used for deploying the app (no confidential data there either).
  • The bypassing of calls to endpoints containing the 'exportPrivate' substring is for NightmareJS's call to load the user's data that is exported to PDF. There is no situation where this call will ever be used across the internet as the call is allowed across HTTP. It will only ever be allowed on the server, using a headless client browser, such as NightmareJS.

Any suggestions on improving the security aspect of the solution are welcome.

RedKrovvy
  • 51
  • 6