join all lines that have the same first column to the same line

Question

IE:

File:

1234:abcd  
1234:930  
1234:999999  
194:keee  
194:284  
194:222222

Result:

1234:abcd:930:999999  
194:kee:284:222222

I have exhausted my brain to the best of my knowledge and can't come up with a way. Sorry to bother you guys!

Can you use some scripting language like python, if u have huge file. — Hackaholic, Sep 17 '15 at 05:45
possible duplicate of [Command line to merge lines with matching first field, 50 GB input](http://stackoverflow.com/questions/31729187/command-line-to-merge-lines-with-matching-first-field-50-gb-input) — NeronLeVelu, Sep 17 '15 at 06:25

John1024 · Answer 1 · 2015-09-17T06:05:23.827

4

$ awk -F: '$1==last {printf ":%s",$2; next} NR>1 {print "";} {last=$1; printf "%s",$0;} END{print "";}' file
1234:abcd:930:999999
194:keee:284:222222

-F:

This tells awk to use a : as the field separator.
$1==last {printf ":%s",$2; next}

If the first field of this line is the same as the first field of the last line, print a colon followed by field 2. Then, skip the rest of the commands and start over with the next line.
NR>1 {print "";}

If we get here, that means that this line has a new not-seen-before value of the first field. If this not the first line, we finish the last line by printing a newline character.
{last=$1; printf "%s",$0;}

Update the variable last with the new value of field 1. Then, print this line.
END{print "";}

After we reach the end of the file, print one last newline character.

Consider this test file:

$ cat testfile2
3:abcd
4:abcd
10:123
3:999
4:999
10:123

Apply this awk script:

$ awk -F: '{a[$1]=a[$1]":"$2;} END{for (x in a) print x ":" substr(a[x],2);}' testfile2
3:abcd:999
4:abcd:999
10:123:123

In this approach, the lines will not necessarily come out in any particular order. If order is important, you may want to pipe this output to sort.

edited Sep 17 '15 at 06:05

answered Sep 17 '15 at 05:30

John1024

Hm. On a test this works like for the text above, but for some reason on the real file (about 2 million lines the same) it doesn't do anything. The output file is the EXACT same as input. Weird? – jahill2002 Sep 17 '15 at 05:44
3:abcd 4:abcd 5:abcd 6:abcd 7:abcd 8:abcd 9:abcd 10:123 11:abcd 12:abcd 13:abcd 14:abcd 15:abcd 16:abcd 17:abcd 18:abcd 19:abcd 20:abcd 3:999 4:999 5:999 6:999 7:999 8:999 9:999 10:123 11:999 12:999 13:999 14:999 15:999 16:999 17:999 18:999 19:999 20:999 – jahill2002 Sep 17 '15 at 05:47
See on that it does not work, any ideas? Sorry for the mess – jahill2002 Sep 17 '15 at 05:47
For your new file, `3:abcd 4:abcd 5:abcd...`, and unlike the sample data in the question, each line has a first column different from the line before. That is why the code changes nothing. Did you want to combine lines even if they aren't consecutive lines with the same first column? – John1024 Sep 17 '15 at 05:56
@jahill2002 I just added to the answer a method for combining non-consecutive lines. – John1024 Sep 17 '15 at 06:00

1 Answers1