How to merge records in csv file based on first field?

Question

I have a csv file,

 id1,v1,v2,v3,v4
 id2,v1,v2,v6,v4
 id1,v7,v8,v3,v9
 id1,v10,v11,v12,v13
 id2,v3,v5,v8,v7

since, the file is not sorted, and should not be! I want output as:

 id1,v1|v7|v10,v2|v8|v11,v3|v12,v4|v9|v13
 id2,v1|v10,v2|v5,v6|v8,v4|v7

Where, all respective values in columns are merged to respective column in record with same id, except repeated value (see v3 in 3rd column for id1) and id.

I tried it using code given here http://www.robelle.com/tips/st-export-notes.html. But it needs much more than that.

How this can be achieved using perl? I am new to perl. Thanks in advance!

Are the pipe-separated compound fields required to be in the order they appear in the file? For instance, is `id1,v10|v1|v7,...` okay? — Borodin, Mar 20 '15 at 14:07

Borodin · Accepted Answer · 2015-03-23T08:37:43.783

1

Assuming you don't need any particular sort order, you can use a hash of arrays to solve this. Hashes are known as dictionaries in other languages.

use strict;
use warnings;

my %data;

while ( <DATA> ) {
  my ($id, @vals) = /[^,\s]+/g;
  for my $i ( 0 .. $#vals ) {
    ++$data{$id}[$i]{$vals[$i]};
  }
}

while ( my ($id, $vals) = each %data ) {
  my @vals = map { join '|', keys %$_ } @$vals;
  printf "%s,%s\n", $id, join ',', @vals;
}

__DATA__
id1,v1,v2,v3,v4
id2,v1,v2,v6,v4
id1,v7,v8,v3,v9
id1,v10,v11,v12,v13
id2,v3,v5,v8,v7

output

id2,v1|v3,v5|v2,v8|v6,v7|v4
id1,v7|v10|v1,v11|v2|v8,v12|v3,v4|v13|v9

edited Mar 23 '15 at 08:37

answered Mar 20 '15 at 13:50

Borodin

126,100
9
70
144

I'm sorry: my original solution was wrong because I misread your question. It should be right now – Borodin Mar 20 '15 at 14:01
Thank you so much @Borodin, Its perfect ans. I just removed $fh in printf, and added 'open(DATA, "< file.csv");' after 'my %data;'. – Nitinkumar Ambekar Mar 23 '15 at 05:30
@NTN: I'm glad to help. I'm sorry about the extraneous `$fh` - it was a legacy from my testing. – Borodin Mar 23 '15 at 08:39

score -1 · Answer 2 · answered Mar 20 '15 at 15:21

You should use proper CSV parser for CSV data

use strict;
use warnings;
use Text::CSV;

my $csv = Text::CSV->new( { binary => 1, eol => $/ } );

my %data;
while ( my $row = $csv->getline(*DATA) ) {
    my $id = shift @$row;
    $data{$id}[$_]{ $$row[$_] } = undef for 0 .. $#$row;
}

for my $id ( sort keys %data ) {
    my $vals = $data{$id};
    $csv->print( \*STDOUT, [ $id, map { join '|', sort keys %$_ } @$vals ] );
}

__DATA__
id1,v1,v2,v3,v4
id2,v1,v2,v6,v4
id1,v7,v8,v3,v9
id1,v10,v11,v12,v13
id2,v3,v5,v8,v7

How to merge records in csv file based on first field?

2 Answers2