0

Description

I have a very large CSV file (around 1 GB) which I want to process in byte chunks of around 10 MB each. For this purpose, I am creating a Readable Stream with byte-range option fs.createReadStream(sampleCSVfile, { start: 0, end: 10000000 })

Problem

Using the above approach, the stream read from the CSV file contains data for the last line which is not entirely complete. I want a way to identify the byte index at which last line break occurred and start my next Readable Stream with that byte index.

Example CSV: (ignore header row)

John,New York,52
Stacy,Chicago,19
Lisa,Indianapolis,40

Sample Operation:

fs.createReadStream(sampleCSVfile, { start: 0, end: 99 })

Data Returned: (trimmed to above-specified byte-range)

John,New York,52
Stacy,Chicago,19
Lisa,I

Required or Expected:

John,New York,52
Stacy,Chicago,19

So, suppose from the stream fetched the last new line ended at byte-index 78, then my next recursive operation will be: fs.createReadStream(sampleCSVfile, { start: 79, end: 178 })

Saurabh Verma
  • 35
  • 1
  • 6

1 Answers1

0

Below is basic code

const fs = require('fs');

let stream =fs.createReadStream('test.csv', { start:0, end:40})

stream.on('data', (data) => {                       
   console.log(data.length);  //
   let a = data.toString()
   console.log(a);
   let i = a.lastIndexOf('\n');
   console.log(i);
   let substr= a.substring(0, i);
   console.log(substr);
   let byteLength= Buffer.byteLength(substr);
   console.log(byteLength);
 });

DEMO: https://repl.it/@sandeepp2016/SpiritedRowdyObject

But there are already a CSV parser like fast-csv or you can use readLine module will allow you to read steam of data line by line more efficiently

Sandeep Patel
  • 4,815
  • 3
  • 21
  • 37