2

I have an app where user can upload a ZIP archive of resources. My app handles the upload and saves this to S3. At some point I want to run a transformation that will read this S3 bucket unzip it, and write it to a new S3 bucket. This is all happening on a node service.

I am using the unzipper library to handle unzipping. Here is my initial code.

    async function downloadFromS3() {
  let s3 = new AWS.S3();
  try {
    const object = s3
      .getObject({
        Bucket: "zip-bucket",
        Key: "Archive.zip"
      })
      .createReadStream();

    object.on("error", err => {
      console.log(err);
    });

    await streaming_unzipper(object, s3);
  } catch (e) {
    console.log(e);
  }
}

async function streaming_unzipper(s3ObjectStream, s3) {
  await s3.createBucket({ Bucket: "unzip-bucket" }).promise();
  const unzipStream = s3ObjectStream.pipe(unzipper.Parse());
  unzipStream.pipe(
    stream.Transform({
      objectMode: true,
      transform: function(entry, e, next) {
        const fileName = entry.path;
        const type = entry.type; // 'Directory' or 'File'
        const size = entry.vars.uncompressedSize; // There is also compressedSize;
        if (type === "File") {
          s3.upload(
            { Bucket: "unzip-bucket", Body: entry, Key: entry.path },
            {},
            function(err, data) {
              if (err) console.error(err);
              console.log(data);
              entry.autodrain();
            }
          );
          next();
        } else {
          entry.autodrain();
          next();
        }
      }
    })
  );

This code is works but I feel like it could be optimized. Ideally I would like to pipe the download stream -> unzipper stream -> uploader stream. So that chunks are uploaded to S3 as they get unzipped, instead of waiting for the full fill uzip to finish then uploading.

The problem I am running into is that I need the file name (to set as an S3 key), which I only have after unzipping. Before I can start to upload.

Is there any good way to create a streaming upload to S3. Initiated with a temporaryId, that gets rewritten with the final final name after the full stream is finished.

Robert Lemiesz
  • 1,026
  • 2
  • 17
  • 29

0 Answers0