-1

I would like to concatenate all columns in a CSV file and then apply an MD5 hash.

I would like to use awk.

With this code, I get the concatenation of the third column.

awk -F'#' '{  { printf "%s",$3 } }'

and I can get the hash with this code

echo -n "HELLO WORLD" | md5sum

Can anyone help me to integrate those two methods? First concat all columns, then apply the MD5 hash.

Sample CSV:

A#B#C#D
E#F#G#H
I#J#K#L

The output will be:

 md5(ABCD)
md5(EFGH)
md5(IJKL)
Houssem Hariz
  • 55
  • 3
  • 12
  • 5
    Have you tried replacing `echo -n "HELLO WORLD"` with your awk command? In what way does that not do what you want? – Ed Morton Mar 29 '17 at 14:23
  • Thanks for your reply. here my csv: A#B#C#D E#F#G#H I#J#K#L The output will be: md5(ABCD) md5(EFGH) md5(IJKL) – Houssem Hariz Mar 30 '17 at 07:54

1 Answers1

1

concatenating columns will mean removing the delimiter, here is a simpler approach

tr -d '#' <file | md5sum

if you want to extract the third column only and concatenate rows into one big string (but why since you're losing information)

cut -d# -f3 file | tr -d '\n' | md5sum

note that now these third columns

ab
c

and

a
bc

will end up with the same hash. Better to preserve the distinctness of the fields by concatenating the values with the same delimiter

cut -d# -f3 file | paste -sd# | md5sum

however, without concatenation you can assume the fields are separated with the newline delimiter and go with

cut -d# -f3 file | md5sum

unless there is an unspecified reason.

UPDATE: you want create md5 hash for each row! Which was the critical information missing in the question.

You can't pipe into md5sum the lines as in the other programs (you need a new invocation each time). One way to address this is

tr -d '#' <file | while read line; do echo $line | md5sum; done

ed5d34c74e59d16bd6d5b3683db655c3  -
8ad37f51cbc6de792c885acf17ba7e40  -
fe672d984bef56cbfce488080f8055b7  -

however, note that you're losing information if your fields are varying in length and overlapping in values.

For example, AB#C and A#BC will generate the same hash, which may or may not be desired but I guess you might have not considered.

karakfa
  • 66,216
  • 7
  • 41
  • 56