Description
I have a very large CSV file (around 1 GB) which I want to process in byte chunks of around 10 MB each. For this purpose, I am creating a Readable Stream with byte-range option fs.createReadStream(sampleCSVfile, { start: 0, end: 10000000 })
Problem
Using the above approach, the stream read from the CSV file contains data for the last line which is not entirely complete. I want a way to identify the byte index at which last line break occurred and start my next Readable Stream with that byte index.
Example CSV: (ignore header row)
John,New York,52
Stacy,Chicago,19
Lisa,Indianapolis,40
Sample Operation:
fs.createReadStream(sampleCSVfile, { start: 0, end: 99 })
Data Returned: (trimmed to above-specified byte-range)
John,New York,52
Stacy,Chicago,19
Lisa,I
Required or Expected:
John,New York,52
Stacy,Chicago,19
So, suppose from the stream fetched the last new line ended at byte-index 78, then my next recursive operation will be: fs.createReadStream(sampleCSVfile, { start: 79, end: 178 })