I have an API which is connected to AWS lambda which does following:
- Getting JSON data from s3. Number of records around 60,000
- Using Json2csv library to parse the JSON data to csv string
- Putting the csv string result to s3 bucket
Point 2 above is taking too long to parse the JSON data into csv string. The library I am using for it is json2csv: https://www.npmjs.com/package/json2csv
Following is my code:
/// Get data in JSON format in object: records (array of JSON)
let headers = [
{
label: "Id",
value: "id"
},
{
label: "Person Type",
value: "type"
},
{
label: "Person Name",
value: "name"
}
];
let json2csvParser = new Parser({ fields: headers });
console.log("Parsing started");
let dataInCsv = json2csvParser.parse(records);
console.log("Parsing completed");
// PutObject of dataInCsv in s3
It is taking around 20 seconds to parse 60K records. Is there anything I can do to improve the performance here? Any other library? I used to think in memory operations are pretty fast. Why is it that this parsing is slow. Any help please.