In bash how to transform multimap to a map of

Question

I am processing output from a file in bash and need to group values by their keys.

For example, I have the

in a file and group all values from a particular key into a single line as in

13,47099,54024,1,39956,0
17,126223,52782,4,62617,0
23,1022724,79958,80590,230,1,118224,0,1049
42,72470,80185,2,89199,0
54,70344,72824,1,62969,1

There are about 10000 entries in my input file. How do I transform this data in shell ?

score 5 · Accepted Answer · answered Jun 06 '17 at 02:31

5

awk to the rescue!

assuming keys are contiguous...

$ awk -F, 'p!=$1 {if(a) print a; a=p=$1} 
                 {a=a FS $2} 
           END   {print a}' file

13,47099,54024,1,39956,0                                                                                                                  
17,126223,52782,4,62617,0                                                                                                                 
23,1022724,79958,80590,230,1,118224,0,1049                                                                                                
42,72470,80185,2,89199,0                                                                                                                  
54,70344,72824,1,62969,1

answered Jun 06 '17 at 02:31

karakfa

66,216
7
41
56

Perfect answer. . Just what I wanted – Anoop Jun 07 '17 at 18:29
The keys are not contiguous, you can `sort` them first and then pipe into the above `awk` code, e.g. `sort -n -k 1 -t "," [file] > awk ...` – Josh Jan 22 '20 at 18:57
@karakfa, I'm a bit new to `awk` and trying to understand your code. It appears to check if `p` is not equal to the first field and, if not, set `p` and `a` equal to the first field, and then print `a`. However, the order of the steps I just described is opposite the order of the operations in the first line of your code. Am I understanding your code correctly? – Josh Jan 22 '20 at 19:24
[This tutorial](https://www.grymoire.com/Unix/Awk.html) mentions that variable definitions can be set inline with the commands that use them using this example `awk '{print $c}' c="${1:-1}"`, but in that case the variable `c` is set outside the `'{...}'` awk command – Josh Jan 22 '20 at 19:38
1

@Josh if the key changes print existing record(if exist) and start building the new one. Second statement will be executed regardless of the condition. At the end print the left over record. – karakfa Jan 22 '20 at 20:11
@karakfa, thanks! I was writing out what I think the code is doing in prose when you posted. I'll posted my breakdown of your code into an answer for newbs like me. – Josh Jan 22 '20 at 20:41

score 1 · Answer 2 · answered Jan 22 '20 at 20:44

Here is a breakdown of what @karakfa's code is doing, for us awk beginners. I've written this based on a toy dataset file:

1,X
1,Y
3,Z

p!=$1: check if the pattern p!=$1 is true
- checks if variable p is equal to the first field of the current (first) line of file (1 in this case)
- since p is undefined at this point it cannot be equal to 1, so p!=$1 is true and we continue with this line of code
if(a) print a: check if variable a exists and print a if it does exists
- since a is undefined at this point the print a command is not executed
a=p=$1: set variables a and p equal to the value of the first field of the current (first) line (1 in this case)
a=a FS $2: set variable a equal to a combined with the value of the second field of the current (first) line separated by the field separator (1,X in this case)
END: since we haven't reached the end of file yet, we skip the the rest of this line of code
move to the next (second) line of file and restart the awk code on that line
p!=$1: check if the pattern p!=$1 is true
- since p is 1 and the first field of the current (second) line is 1, p!=$1 is false and we skip the the rest of this line of code
a=a FS $2: set a equal to the value of a and the value of the second field of the current (second) line separated by the filed separator (1,X,Y in this case)
END: since we haven't reached the end of file yet, we skip the the rest of this line of code
move to the next (third) line of file and restart the awk code
p!=$1: check if the pattern p!=$1 is true
- since p is 1 and $1 of the third line is 3, p!=$1 is true and we continue with this line of code
if(a) print a: check if variable a exists and print a if it does exists
- since a is 1,X,Y at this point, 1,X,Y is printed to the output
a=p=$1: set variables a and p equal to the value of the first field of the current (third) line (3 in this case)
a=a FS $2: set variable a equal to a combined with the value of the second field of the current (third) line separated by the field separator (3,Z in this case)
END {print a}: since we have reached the end of file, execute this code
- print a: print the last group a (3,Z in this case)

The resulting output is

1,X,Y
3,Z

Please let me know if there are any errors in this description.

score 0 · Answer 3 · answered Jan 22 '20 at 23:23

0

Slight tweak to @karakfa's answer. If you want the separator between the key and the values to be different than the separator between the values, you can use this code:

awk -F, 'p==$1 {a=a "; " $2} p!=$1 {if(a) print a; a=$0; p=$1} END {print a}'

answered Jan 22 '20 at 23:23

Josh

1,210
12
30

In bash how to transform multimap to a map of

3 Answers3

Linked