0

I have a case where I am getting a large CSV from an external source as a ReadStream

I need to consume the ReadStream in 2 places

  1. Upload to S3
  2. Read the CSV to find the latest date in the CSV and upload to db

My solution is working for small files, like 10kb but for large files (several megabytes) it does not and the uploading does not start (and the the CSV read either)

The uploading is done 1st and then reading the CSV (and getting the date)

I am attempting to "clone" the ReadStream like this:

const clonedStream1 = responseStream.pipe(new PassThrough());
const clonedStream2 = responseStream.pipe(new PassThrough());

I have also tested with cloneable-readable package but without success

What is the reason this does not work for large files, why is it getting stuck ? I am most likely missing some vital information how these streams are working

I have tested and the uploading and reading the CSV works independently for large files

In the S3 upload I am using the multipart upload and in the CSV read I am using csv-parse lib for getting the date

This is implemented in NodeJS

Any ideas ?

jani_r
  • 637
  • 1
  • 7
  • 18
  • The only way I got this working is by pushing the data to 2 PassThrough stream, it seems piping does not work – jani_r May 08 '23 at 12:53

0 Answers0