0

I'm dealing with large 1Gb csv file. I want to parse only a portion of the file, so I passed in the start and end options to createReadStream. When I do that, the csv-parser doesn't parse any rows.

import { createReadStream } from 'fs';
import csv from 'csv-parser';

const rs = createReadStream('xyz.csv', { start: 1000, end: 50000 });

rs.pipe(csv())
  .on('data', (row) => {
    console.log(row);
  });
Pumpkin Pie
  • 510
  • 1
  • 4
  • 15
  • 2
    Well a CSV parser is only going to work properly if you feed it precise, full lines. Starting at some random point in the file is likely to start somewhere in the middle of a line and not give you a valid, full line. You could probably add some error handling to log whatever error csv is running into. – jfriend00 Feb 14 '22 at 01:00
  • That is exactly what is happening. The partial chunks are "unparsable". I was just hoping for the parser to discard the first malformed chunk and every subsequent one should be good. I thought this is built into csv-parser. If not, I'd have to look for some other library or write my own code to handle that. Any suggestions is helpful! Thanks. – Pumpkin Pie Feb 14 '22 at 09:10
  • 2
    You yourself could read a bunch of data starting at 1000, find the end of the line in that chunk you read and then set the stream to start at that line boundary. You will probably still have an issue on the last line (if you don't read to the end of the file), but you can presumably ignore that error. – jfriend00 Feb 14 '22 at 09:12
  • How can I start the stream from that line boundary after finding the chunk. If you have an example that I can look into, it'd be helpful! – Pumpkin Pie Feb 14 '22 at 09:18
  • 2
    When you find the line boundary you want, you see how many bytes past your 1000 start position it was and then you can use `const rs = createReadStream('xyz.csv', { start: x, end: 50000 });` where `x` is your calculated start position. Since that will start on a line boundary, it should work with csv. – jfriend00 Feb 14 '22 at 09:19
  • Thank you, I'll give it a try and let you know. Finding the byte size precisely is the challenge here. May be there are some libraries that help me identify that. – Pumpkin Pie Feb 14 '22 at 09:24
  • Byte size? Read a block from the file at a known offset into the file. Search that block for the first line ending you find. Add to your original known offset that you read from the number of bytes you found before the line ending. That's now an offset of a line beginning that you can feed into your `fs.createReadStream()` so it will start at the beginning of a line. Not hard at all. Don't need a library for that. – jfriend00 Feb 14 '22 at 22:47

0 Answers0