0

I need to trigger, through an http request, a process where I download some data from S3, gunzip it, modify the stream, gzip it and send to another bucket in S3.

So far I was able to either:

  1. Download
  2. Gunzip
  3. Modify (filter) the data
  4. return the data

Or:

  1. Download
  2. Gunzip
  3. Gzip
  4. Upload the unmodified data and retrieve the url of the object

My first attempt at this consisted in using the on('data') event from the gunzip stream to modify the data; then when the 'end' event is thrown, I can return it to the browser making the request.

var accumulator = [];

gunzip.on('data', chunk=>{
    var lines = chunk.toString('utf-8').split(\n);
    lines.forEach(line=>{
       if(shouldBeFiltered(line)){
         accumulator.push(line);
       }
    })
})

gunzip.on('end', ()=>{
    res.send(accumulator);
})

getS3.pipe(gunzip) 

If instead of returning the result (res.send) I try to pipe gunzip to gzip, the filter is ignored. It makes sense as I have an accumulator array that I return (in the previous case) when the end event is thrown.

Then after some digging, I found a reference suggesting that the data should be pushed to, and I tried the following, which did not work:

gunzip.on('data', chunk=>{
    var lines = chunk.toString('utf-8').split(\n);
    lines.forEach(line=>{
       if(shouldBeFiltered(line)){
         gunzip.push(line);
       }
    })
})

// the end event no longer mattered
// gunzip.on('end', ()=>{
//    res.send(accumulator);
// })

getS3.pipe(gunzip).pipe(gzip).pipe(putS3(putS3param.Key, putS3param.Bucket)); 

Then I tried to create a transform stream (this is extremely simplified as I was trying the concept), but then I had an internal error:

const stream = require('stream');
const Transform = stream.Transform;

function filter(pipeline) {
    var the_filter = new Transform({
        transform(chunk, encoding, next) {
            console.log();
            chunk += Buffer('Modified', 'utf-8');
            this.push(chunk);
            next();
        }
    });
    pipeline.pipe(the_filter);
}

Other than creating a file and gziping it and uploading I have no more ideas.

Thanks for any help!

OJEP
  • 26
  • 5

1 Answers1

0

After much digging around, I finally found the answer in this page

It seems that what was missing as setting Transform as objectMode, other than that, I see nothing relevant.

    var stream = require('stream')
    var liner = new stream.Transform( { objectMode: true } )
    
    liner._transform = function (chunk, encoding, done) {
         var data = chunk.toString()
         if (this._lastLineData) data = this._lastLineData + data
    
         var lines = data.split('\n')
         this._lastLineData = lines.splice(lines.length-1,1)[0]
    
         lines.forEach(this.push.bind(this))
         done()
    }
    
    liner._flush = function (done) {
         if (this._lastLineData) this.push(this._lastLineData)
         this._lastLineData = null
         done()
    }
    
    module.exports = liner
OJEP
  • 26
  • 5