0

I have an incredibly large JSON file (several gigabytes – too large to fit in a JS string) that I'm trying to GZip and upload to S3

Currently I have the following code

import { stringifyStream } from '@discoveryjs/json-ext';
import zlib from 'zlib';

export async function safeStringify({
  content,
  gzipCompress,
}: {
  content: any;
  gzipCompress?: boolean;
}) {
  let json: Readable | null = stringifyStream(content);

  if (gzipCompress && json !== null) {
    json = json.pipe(zlib.createGzip());
  }

  return json;
}

const stringifiedStream = await safeStringify({content, gzipCompress: true})
const output = await myAwsClient.upload({
  Body: stringifiedStream,
  Bucket: 'my-bucket',
  Key: 'my-key',
}).promise()

This works with small JSON (like {"hello": "world"}) but is failing on my super large JSON.

I'm wondering if there's any obvious mistakes I'm making here, or things to try to avoid this null upload – even tips to help debug, since I've tried a few things to no avail

  • I tried .read(100)'ing the output of stringifiedStream and got null
  • I tried waiting for the readable to fire a read event by adding a .on('readable', () => stringifiedStream.read(100)) and still got null
    • Sometimes I'll get a second 'readable' event fire where it is possible to pull data though which is kinda weird – I'll actually get up to 3, so any advice on how to poll from readable when it actually is readable would be appreciated too

Thanks!

Update: I'm seeing now that multiple 'readable' events get fired; in the first event, when I call .read(100) I get a result of null but in future events I get data – I think this might be the root cause here

mjkaufer
  • 4,047
  • 5
  • 27
  • 55
  • Figured it out – there was some monkey-patched code that wrote my buffer to a backup, which consumed the readable, making the next readable end up as empty – mjkaufer Oct 27 '22 at 20:37

0 Answers0