2

I'm downloading a daily export gzip file from The Movie Database and decompressing it with zlib. When the end event is hit, I log the length of the string of data I decompressed. The length is different every time.

It appears that the data isn't fully getting decompressed. I noticed this when I started parsing the JSON that the file actually contains. It would get half-way through converting each line of JSON (each line represents a stand-alone json object) and blow up because the json was malformed.

var http = require('http');
var zlib = require('zlib');

var downloadUrl = "http://files.tmdb.org/p/exports/movie_ids_03_01_2018.json.gz";
http.get(downloadUrl, function(response) {
    var fileContents = "";
    var gunzip = zlib.createGunzip();

    gunzip.on('data', function(data) {
        fileContents += data.toString();
    });

    gunzip.on('end', function() {
        console.log(fileContents.length);
    });

    response.pipe(gunzip);
});

Am I using the gunzip events incorrectly?

I have a reproducible example you can execute to see it running.

Johnathon Sullinger
  • 7,097
  • 5
  • 37
  • 102

1 Answers1

0

I solved this by replacing my usage of http with request. I'm not sure what I was doing wrong with http.get but piping the gunzip into the request return value solved my problem.

var request = require('request');
var zlib = require('zlib');
var fs = require('fs');

var downloadUrl = "http://files.tmdb.org/p/exports/movie_ids_03_01_2018.json.gz";
var response = request(downloadUrl);
var fileContents = "";
var gunzip = zlib.createGunzip();

gunzip.on('data', function(data) {
    fileContents += data.toString();
});

gunzip.on('end', function() {
    var json = fileContents.split('\n').filter(function(value, index) {
        if (value == "") {
            console.log(index + " is empty and skipped.");
            return false;
        }

        return true;
    });
});

response.pipe(gunzip);

I tried to use request.get(options, function(error, response, body){}); but could not pipe gunzip into the response or body. I'm new to streams and need to research more to figure out what was wrong. In the mean time, the solution above works without a problem.

Since this runs once a day, as an Azure Function, running it synchronously like this isn't a big deal. I'm not blocking any further work.

Johnathon Sullinger
  • 7,097
  • 5
  • 37
  • 102