I have an app where user can upload a ZIP archive of resources. My app handles the upload and saves this to S3. At some point I want to run a transformation that will read this S3 bucket unzip it, and write it to a new S3 bucket. This is all happening on a node service.
I am using the unzipper library to handle unzipping. Here is my initial code.
async function downloadFromS3() {
let s3 = new AWS.S3();
try {
const object = s3
.getObject({
Bucket: "zip-bucket",
Key: "Archive.zip"
})
.createReadStream();
object.on("error", err => {
console.log(err);
});
await streaming_unzipper(object, s3);
} catch (e) {
console.log(e);
}
}
async function streaming_unzipper(s3ObjectStream, s3) {
await s3.createBucket({ Bucket: "unzip-bucket" }).promise();
const unzipStream = s3ObjectStream.pipe(unzipper.Parse());
unzipStream.pipe(
stream.Transform({
objectMode: true,
transform: function(entry, e, next) {
const fileName = entry.path;
const type = entry.type; // 'Directory' or 'File'
const size = entry.vars.uncompressedSize; // There is also compressedSize;
if (type === "File") {
s3.upload(
{ Bucket: "unzip-bucket", Body: entry, Key: entry.path },
{},
function(err, data) {
if (err) console.error(err);
console.log(data);
entry.autodrain();
}
);
next();
} else {
entry.autodrain();
next();
}
}
})
);
This code is works but I feel like it could be optimized. Ideally I would like to pipe the download stream -> unzipper stream -> uploader stream. So that chunks are uploaded to S3 as they get unzipped, instead of waiting for the full fill uzip to finish then uploading.
The problem I am running into is that I need the file name (to set as an S3 key), which I only have after unzipping. Before I can start to upload.
Is there any good way to create a streaming upload to S3. Initiated with a temporaryId, that gets rewritten with the final final name after the full stream is finished.