0

I need to populate PG with a tsv file ~ 1.5 G. I plan to use streaming and pg-copy-stream and it worked for direct copy. Then I need to do some transformation and added a through pipe and it failed. I guess it is probably a buffer problem and someone must have done this.

The origin tsvfile.txt has the format

V1\tV2\tV3\tV4\n
V2\tV2\tV3\tV4\n

The code is

var fs = require('fs'), pg = require('pg'), es = require('es'), pgs = require('pg-copy-stream');
var filename = 'tsvfile.txt';
var pgkey = 'somepgkey';
pg.connect(pgkey, function(err, client, done){
    var query = client.query(pgs.from('COPY table1 (C1, C2, C3, C4) FROM STDIN'));
    var fstream = fs.createReadStream(filename);
    fstream.pipe(es.split())
           .pipe(es.mapSync(function(line){
                var midline = line.split('\t').map(sometransform()).join('\t');
                return midline + '\n';
                //not sure \n is necessary here
            }).pipe(query)
             .on('end', done)
             .on('err', somethingelse)
})

The error I got was error: extra data after last expected column, but works fine if I remove the first two pipes.

Da Qi
  • 615
  • 5
  • 10
  • Update: if I remove the first 2 pipes, it works fine. But when I register a listener to query (which is a stream) by `query.on('data', callback), it messes up the whole process again. pg will drop the communication and no data is stored. It could be something with the pg-copy-stream module. – Da Qi Jan 21 '16 at 20:19

2 Answers2

0

The first thing I would try is to remove the + '\n' - it might be that that messes up the new line. If that does not help. The first step would be to make a sometransform() function that does not change anything. If that works without errors, your problem is in the sometranform() function (do you e.g. add a \t within the function?)

MortenSickel
  • 2,118
  • 4
  • 26
  • 44
  • I removed the entire transformation section and only left the es.split() but it is still not working. Remove both transformation and es.split() would solve the problem. I suspect doing transformation I messed up the buffer in stream but do not know how to investigate or/and fix it. – Da Qi Jan 20 '16 at 15:17
0

line.split('\t').map(sometransform()).join('\t')

Does someTransform really return a function? wich is used to transform the data?

If the Answer is "no", or "what?", try this: line.split('\t').map( sometransform ).join('\t')

Thomas
  • 3,513
  • 1
  • 13
  • 10