16

I've got a compressed gzip file which I would like to read line by line.

var fs = require('fs')
var zlib = require('zlib')
var gunzip = zlib.createGunzip()
var inp = fs.createReadStream('test.gz')
var n = 0

var lineProcessing = function (err, data) {
    if (!err) {
        n += 1
        console.log ("line: " + n)
        console.log (data.toString())
    }
}

inp
  .on('data', function (chunk) {
      zlib.gunzip (chunk, lineProcessing)
  })
  .on('end', function () {
    console.log ('ende');
  });

I guess I need to set a chunksize for zlib.createGunzip that I only read until the next \n. But how to determine it dynamically?

Markus
  • 2,998
  • 1
  • 21
  • 28

3 Answers3

28

It might be easier to use readline for this:

const fs       = require('fs');
const zlib     = require('zlib');
const readline = require('readline');

let lineReader = readline.createInterface({
  input: fs.createReadStream('test.gz').pipe(zlib.createGunzip())
});

let n = 0;
lineReader.on('line', (line) => {
  n += 1
  console.log("line: " + n);
  console.log(line);
});
robertklep
  • 198,204
  • 35
  • 394
  • 381
  • What about if I want to stream unzipped stream form another function instead of unzipping local file? I get some weird errors coming from readline.js file.. – Tomas Jun 29 '16 at 15:20
  • @Tomas do you mean you want to process a "regular" stream (not a gzipped one)? You can use any readable stream as argument for `input`. – robertklep Jun 29 '16 at 15:23
  • I'm trying to use a stream coming from zlib.gunzip(). my work flow is: I get file from aws s3, unzip it using gunzip, then pass the stream to readline but it's throwing me errors, could it be that stream is inconsistent or something? – Tomas Jun 29 '16 at 15:35
  • Apparently once I unzip file using gunzip it passes not a stream but a string. – Tomas Jun 29 '16 at 15:48
  • @Tomas `zlib.gunzip()` doesn't return a stream, it returns the gunzipped data (in the callback). You probably want to use `zlib.createGunzip()` as well. – robertklep Jun 29 '16 at 16:53
  • I tried doing `data.pipe(zlib.createGunzip())` `data` being gunzipped buffer coming from `zlib.gunzip()` and I get an error saying that `data.pipe(zlib.createGunzip()) is not a function` any ideas on this? – Tomas Jun 30 '16 at 09:33
  • @Tomas perhaps open a new question where you explain what you're trying to accomplish? – robertklep Jun 30 '16 at 09:51
  • would you mind having a look [here](http://stackoverflow.com/questions/38120231/how-to-gunzip-stream-in-nodejs) ? – Tomas Jun 30 '16 at 10:18
8

If anyone is still looking into how to do this years later, and wants a solution that works with async/await, here's what I'm doing (TypeScript, but you can just ditch the type annotations).

import fs from "fs";
import zlib from "zlib";
import readline from "readline";

const line$ = (path: string) => readline.createInterface({
    input: fs.createReadStream(path).pipe(zlib.createGunzip()),
    crlfDelay: Infinity
});

const yourFunction = async () => {
    for await (const line of line$("/path/to/file.txt.gz")) {
        // do stuff with line
    }
}
Andrei
  • 1,723
  • 1
  • 16
  • 27
3

Read plain text or gzip files line by line, in TypeScript:

import * as fs from 'fs';
import * as zlib from 'zlib'
import * as readline from 'readline'

function readFile(path: string) {
    let stream: NodeJS.ReadableStream = fs.createReadStream(path)
    
    if(/\.gz$/i.test(path)) {
        stream = stream.pipe(zlib.createGunzip())
    }

    return readline.createInterface({
        input: stream,
        crlfDelay: Infinity
    })
}

async function main() {
    const lineReader = readFile('/usr/share/man/man1/less.1.gz')

    for await(const line of lineReader) {
        console.log(line)
    }
}

main().catch(err => {
    console.error(err);
    process.exit(1)
})

mpen
  • 272,448
  • 266
  • 850
  • 1,236