4

I'm having a blonde moment and can't seem to get my head around it, but I have a csv file (sample below) in which I would like to sort the rows out by the first value in the row - hexa-numerically (from small to large)

570e2e5c,1460539517,SOM3-String-123,08-5a-0c-59
570e2e81,1460539520,SOM3-String-123,08-00-0c-59
570e2e87,1460539521,SOM3-String-123,09-5e-6b-22
570e2e5e,1460539518,SOM3-String-123,08-00-0c-59
570e2e90,1460539522,SOM3-String-123,08-00-0c-59
570e2e95,1460539523,SOM3-String-123,09-00-67-22
570e2e60,1460539519,SOM3-String-123,09-00-68-22

Q: How can I sort out with BASH script a csv files rows by the rows first element which is hexadecimal?

Addendum:

So I convert the hex strings into a dec strings with this code:

IFS=','
while read f1 f2 f3 f4 
do 

f1_upper_case=`echo "$f1" | tr '[:lower:]' '[:upper:]'`
f1_dec=$((16#$f1_upper_case))
echo "$f1_dec,$f2,$f3,$f4" >>$csv_temp 

done < $csv

I will try to sort by $f1_dec in the csv file.

3kstc
  • 1,871
  • 3
  • 29
  • 53
  • For sorting using just the first element, you should be able to adapt the code easily from [Bash - sort CSV file by first four columns](http://stackoverflow.com/q/11934809/62576). Your hex values should sort fine without any conversion. – Ken White Apr 15 '16 at 00:10
  • @KenWhite Why doesn't it matter? Wouldn't `0xf`, which is smaller than `0xaf`, get sorted the wrong way around? I guess it depends on padding, i.e., the exact format of the hex numbers. – Benjamin W. Apr 15 '16 at 00:12
  • Ah, I see it now. Please don't link to the files, put a representative excerpt of them into the question itself. – Benjamin W. Apr 15 '16 at 00:12
  • Looks like they're all the same width, so you can just sort lexically. – Benjamin W. Apr 15 '16 at 00:13
  • @BenjaminW. Ok. But why the downvote :/ – 3kstc Apr 15 '16 at 00:14
  • That wasn't my downvote - and if you want to improve your question further, show what you've tried so far and how it didn't work for you. – Benjamin W. Apr 15 '16 at 00:37
  • https://unix.stackexchange.com/questions/139801/sort-by-hex-value – Ciro Santilli OurBigBook.com Aug 15 '17 at 14:42

2 Answers2

5

If the strings all have the same number of digits and the same letter case, as is true in your sample data, you can just sort as-is. The default lexical sort will put them in the right order because the hex digits 0-9a-f occur in that order in standard character sets.

If you might have some numbers with different numbers of digits, or if you have mixed upper and lower case on the alphabetic digits, then your best bet is probably to convert to decimal, sort numerically, and convert back. If you have the GNU version of awk, you can use it to do the conversion:

awk -v{,O}FS=, '{$1=strtonum("0x"$1)}1' $filename | 
    sort -t, -n -k1,1 |
    awk -v{,O}FS=, '{$1=sprintf("%x",$1)}1' >$new_filename

Run on your sample input, I get this output:

570e2e5c,1460539517,SOM3-String-123,08-5a-0c-59
570e2e5e,1460539518,SOM3-String-123,08-00-0c-59
570e2e60,1460539519,SOM3-String-123,09-00-68-22
570e2e81,1460539520,SOM3-String-123,08-00-0c-59
570e2e87,1460539521,SOM3-String-123,09-5e-6b-22
570e2e90,1460539522,SOM3-String-123,08-00-0c-59
570e2e95,1460539523,SOM3-String-123,09-00-67-22

Explanation:

-v name=value tells awk to set a variable that will exist in the context of the program; this is a handy way to inject values without having to deal with the quoting complexities that accompany string interpolation into code. But some variable names are special; FS tells awk what (F)ield (S)eparator to use to split the input lines into fields, and OFS tells it what (O)utput field separator to use when printing lines back out. The -v{,O}FS=, sequence is just a shortcut to set both variables to the same value; it gets expanded by the shell's brace-expansion into -vFS=, -vOFS=,. So within the awk program, each line will be pre-split on commas into fields, with the individual field values in the variables $1, $2, and so on. When those variables are changed, they will be joined back together with commas when printed back out.

The code in the string passed to awk after the -v options is the awk program to run. Such a program is a sequence of (condition, block) pairs; each condition is checked against each line of input, and then the block is evaluated if the condition is true. If a block appears without a condition, it is run on every line; if a condition appears without a block, it causes the current line – or the fields of the current line joined by OFS, if other blocks have made modifications – to be printed out.

The programs above use both defaults; each starts with a code block that has no condition, so it is run on every line, and ends with a condition that has no code block: 1, which is always true, so every line is printed back out after the code block has made its changes.

The first awk program uses the strtonum function to convert the first field to decimal. This is why this solution only works in GNU awk; BSD awk (which is also the one that ships on macOS) does not have strtonum. We have to prepend "0x" to the field value so that strtonum will know to treat it as hexadecimal, but once it is a numeric value, it will be printed out as decimal. So the output of the awk is the same as the input except with the first field converted to decimal.

We feed that to sort, tell it to sort numerically (-n) by that first (-k1,1) comma-separated (-t,) field, and then feed the sorted output into the second awk, which uses the sprintf function to convert the numeric first field back to hex.

Mark Reed
  • 91,912
  • 16
  • 138
  • 175
  • 1
    could you kindly explain how this code works? i.e what -vOFS and -F strtonum, for myself and everyone to understand please? Also where do i feed the `$csv` file. Also the first column is a timestamp in HEX different to the timestamp in second column. Could you kindly include this in a while read loop so I can better understand! – 3kstc Apr 15 '16 at 01:05
  • 1
    `-vOFS=`, sets the "OFS" variable to a comma. OFS is the "output field separator", and so setting this to a comma causes the output fields to be delimited as such. The `-F,` sets the input field delimiter to a comma, and the `strtonum("0x"$1)` converts a string to a number. The 0x part is added so that `strtonum` interprets the string as a hexadecimal number. – dreamlax Apr 15 '16 at 01:11
2

Input:

$ cat /tmp/so36636125.txt 
570e2e5c,1460539517,SOM3-String-123,08-5a-0c-59
570e2e81,1460539520,SOM3-String-123,08-00-0c-59
570e2e87,1460539521,SOM3-String-123,09-5e-6b-22
570e2e5e,1460539518,SOM3-String-123,08-00-0c-59
570e2e90,1460539522,SOM3-String-123,08-00-0c-59
570e2e95,1460539523,SOM3-String-123,09-00-67-22
570e2e60,1460539519,SOM3-String-123,09-00-68-22

Transformation one liner:

$ awk '{ dec = sprintf("%d", "0x" $1); print dec " "  $0 }' /tmp/so36636125.txt | sort -n -k 1 | cut -f2- -d' '
570e2e5c,1460539517,SOM3-String-123,08-5a-0c-59
570e2e5e,1460539518,SOM3-String-123,08-00-0c-59
570e2e60,1460539519,SOM3-String-123,09-00-68-22
570e2e81,1460539520,SOM3-String-123,08-00-0c-59
570e2e87,1460539521,SOM3-String-123,09-5e-6b-22
570e2e90,1460539522,SOM3-String-123,08-00-0c-59
570e2e95,1460539523,SOM3-String-123,09-00-67-22

$ awk --version
GNU Awk 3.1.7

Explanation:

  • awk adds a new first column as the decimal representation of the hex number
  • sort numerically on first column
  • cut removes the first column
Arun
  • 19,750
  • 10
  • 51
  • 60