Let's say the json array stored in s3 is person details such as:
[{"name":'A', "lastName":'A', age: 18}, {"name":'B', "lastName":'B', age: 20}, ...
]
This file could be extremely large and I would like to optimize memory usage and use Streams to filter data instead of loading the whole file into memory and filtering it.
I am not sure I entirely understand how "objectMode" works here.
I have tried the following which fails as console log prints chunks as equally sized bytes of string and not group of objects despite using objectMode: true
:
const filteredData = [];
const filterTransform = new Transform({
objectMode: true,
transform(chunk, _, callback) {
console.log("chunk : \n"+chunk);
try {
const filteredData = chunk.map((item: any) => ({
name: item.name,
lastName: item.lastName,
}));
filteredData.push(JSON.stringify(filteredData));
} catch (err) {
callback(err);
}
callback();
},
});
const client = getS3Client();
const command = new GetObjectCommand({
Bucket: bucket,
Key: key,
});
const data:GetObjectCommandOutput= await client.send(command);
const readStream = (dataSingle.Body! as Readable)
.pipe(zlib.createGunzip())
.pipe(filterTransform)
Sample output is
chunk 1 "{name:'A', lastName:'"
chunk 2 "A'}, {name:'B', lastN"
chunk 3
and so on..
But I expect:
chunk 1 \[{"name":'A', "lastName":'A'}, {"name":'B', "lastName":'B'}\]
chunk 2 \[{"name":'C', "lastName":'C'}, {"name":'D', "lastName":'D'}\]
... How do I get the chunk to be counted as a list of objects instead of bytes?