How can I bypass Heroku's timeout when retrieving large amounts of data from S3?

Question

I am trying to write a Koa endpoint that retrieves and archives a large amount of data from AWS S3. Response times can take over a minute in some cases but performance is not a huge concern. However, when hosting on Heroku, the page always times out before the data can be processed and returned (apparently because Heroku has a limit of 30s on any single request). My (simplified) code to retrieve and return the data is:

router.get('/diagnostics/data', async (ctx) => {

  const archive = archiver('zip', { zlib: { level: 9 } })
  ctx.body = archive

  try {
    ctx.status = 200
    ctx.type = 'application/zip'
    ctx.attachment(`myS3Data.zip`)

    // on stream closed we can end the request
    archive.on('end', () => {
      log.info(util.format('Archive wrote %d bytes', archive.pointer()))
    })
    archive.on('error', (err) => {
      throw err
    })

    // This line does the actual retrievals from S3 and takes the most time, over 1min on large requests
    // myKeys here is just an example, a real request would have a lot more keys
    const myKeys = ['key1','key2','key3']

    const stream = new Stream.PassThrough()
    ctx.body = stream
    archive.pipe(stream)

    // retrieve and append one archive chunk at a time
    // theoretically this should push to the stream so the Heroku doesn't time out
    for (let i = 0; i < myKeys .length; i += 1) {
      const retrieval = s3Client.getObject(BUCKET_NAME, myKeys[i])

      await Promise.allSettled([retrieval]).then((data) => {
        const payload =
          data[0].status === 'fulfilled'
            ? data[0].value
            : data[0].value || `Failed to retrieve data with key ${myKeys [i]}! Status: ${data[0].status}`
        const keyFinal = data[0].status === 'fulfilled' ? myKeys [i] : `${myKeys [i]}_Failed.txt`
        archive.append(payload, { name: keyFinal })
      })
    }
    
  } catch (err) {
    ctx.status = 500
    ctx.body = `${err.message}`
  } finally {
    archive.finalize()
  }

  return ctx
})

When hosted locally (ie. with no timeout to worry about), the S3 retrievals take about a minute to process, then the client finally starts downloading the zip file. Is there a way to have the client immediately access the zip file but just download bit by bit as its being compiled from S3? This way the overall time to completion is not changed but the Heroku will not see it as a hung connection while the zip is being compiled on the server. Alternatively, maybe some dummy event or data can be sent back to the client while processing is still going just to let it know not to time out too early.

EDIT: EDIT: reorganised the code to stream an archive back, which according to a few sources should have solved the issue, but the endpoint still behaves exactly the same, with the archive just being returned in bulk after far too long after the timeout triggers.

Does this answer your question? [Prevent request timeout with long requests](https://stackoverflow.com/questions/12791580/prevent-request-timeout-with-long-requests) — ChrisGPT was on strike, Jun 27 '20 at 03:27
@Chris I had a try at using streaming (with this example: https://stackoverflow.com/questions/59966688/i-want-to-stream-a-zip-archive-via-koa-node-js) to hopefully return the archive immediately but at a slower rate but see my edit — SoLegendary, Jun 29 '20 at 04:02

How can I bypass Heroku's timeout when retrieving large amounts of data from S3?

0 Answers0