127

I have a file (user.csv)like this

ip,hostname,user,group,encryption,aduser,adattr

want to print all column sort by user,

I tried awk -F ":" '{print|"$3 sort -n"}' user.csv , it doesn't work.

vidit
  • 6,293
  • 3
  • 32
  • 50
user2452340
  • 1,355
  • 2
  • 10
  • 5

10 Answers10

235

How about just sort.

sort -t, -nk3 user.csv

where

  • -t, - defines your delimiter as ,.

  • -n - gives you numerical sort. Added since you added it in your attempt. If your user field is text only then you dont need it.

  • -k3 - defines the field (key). user is the third field.

jaypal singh
  • 74,723
  • 23
  • 102
  • 147
  • 2
    How can I use sort 2 columns? for example, I want sort by column 6 first, and sort by column 3 second. – user2452340 Jun 25 '13 at 14:12
  • 2
    This won't work if there are quoted strings containing commas in the CSV (unless the column you want to sort by is earlier than the comma-containing column). You might have to make a pass first with awk (using FPAT="[^,]*|\"[^\"]*\"" and OFS="|" or some other delimiter that you could use with sort) – davemyron May 09 '15 at 22:28
  • 1
    @user2452340 You could do this: `sort -t, -nk3 filename.csv | sort -t, -nk6` - first it will sort by column 3, then will sort that by column 6 so column 6 is sorted correctly all the way and for any rows where column 6 is the same, those will be sorted by column 3. – Matthew Jul 27 '15 at 16:50
  • 3
    @Matthew `sort -t ',' -k3,3n -k6,6n` will be better. `-k3` will use column 3 and the rest of the line. – Kusalananda Jul 26 '16 at 16:15
  • 1
    I just needed the -t, to divide my 2 column file divided by commas, thanks jaypal – Ricardo Rivera Nieves Aug 25 '20 at 01:04
  • I've compared performance and `sort -nk3` is considerably faster than `awk '{print $0|"sort -nk3 "}' ` (14 seconds compared with 66 seconds for a file with 1 million rows and 3 columns). – Tom Kelly Oct 06 '20 at 07:16
26
  1. Use awk to put the user ID in front.
  2. Sort
  3. Use sed to remove the duplicate user ID, assuming user IDs do not contain any spaces.

    awk -F, '{ print $3, $0 }' user.csv | sort | sed 's/^.* //'
    
user3781670
  • 261
  • 3
  • 2
  • This is very useful, especially if you need to parse or combine columns to add a sort field, then retain only the original line. I used awk/split to parse/combine date & time fields for a sort, then remove. – skytaker Nov 29 '17 at 15:46
  • 2
    `sort` already knows how to sort by a particular column, but this technique -- known as the [Schwartzian transform](https://en.wikipedia.org/wiki/Schwartzian_transform) -- is useful when the field you want to sort on is not trivially a well-defined column. – tripleee Mar 04 '19 at 07:37
  • Would the `print $3, $0` print the sorted columns, or would it print the before-sorting columns? – Tan Yu Hau Sean Jul 29 '22 at 11:56
15

Seeing as that the original question was on how to use awk and every single one of the first 7 answers use sort instead, and that this is the top hit on Google, here is how to use awk.

Sample net.csv file with headers:

ip,hostname,user,group,encryption,aduser,adattr
192.168.0.1,gw,router,router,-,-,-
192.168.0.2,server,admin,admin,-,-,-
192.168.0.3,ws-03,user,user,-,-,-
192.168.0.4,ws-04,user,user,-,-,-

And sort.awk:

#!/usr/bin/env -S awk -f
#
# original source:
# https://stackoverflow.com/a/65768883/586229
#
# Usage:
#   awk -f sort.awk [-F<field separator>] [-v h=HAS_HEADER] [-v f=COLUMN_TO_SORT_BY] INPUT_FILE
# Examples:
#   awk -f sort.awk -F, -v h=1 -v f=1 input.csv > output.csv
#   cat input.txt | awk -f sort.awk | tee -a output.txt

# for each line
{
    if (h && NR == 0) {
        print $0
    } else {
        a[NR-h]=$0 ""
        s[NR-h]=$f ""
    }
}

END {
    isort(s, a, NR-h);
    for (i = 1; i <= NR-h; i++) {
        print a[i]
    }
}

# insertion sort of A[1..n]
function isort(S, A, n, i, j) {
    for (i = 2; i <= n; i++) {
        hs = S[j=i]
        ha = A[j=i]
        while (S[j-1] > hs) {
            j--;
            S[j+1] = S[j]
            A[j+1] = A[j]
        }
        S[j] = hs
        A[j] = ha
    }
}

To use it:
See header in the script.

dagelf
  • 1,468
  • 1
  • 14
  • 25
12

You can choose a delimiter, in this case I chose a colon and printed the column number one, sorting by alphabetical order:

awk -F\: '{print $1|"sort -u"}' /etc/passwd
Ketan Maheshwari
  • 2,002
  • 3
  • 25
  • 31
10
awk -F, '{ print $3, $0 }' user.csv | sort -nk2 

and for reverse order

awk -F, '{ print $3, $0 }' user.csv | sort -nrk2 
vsingh
  • 6,365
  • 3
  • 53
  • 57
7

try this -

awk '{print $0|"sort -t',' -nk3 "}' user.csv

OR

sort -t',' -nk3 user.csv
VIPIN KUMAR
  • 3,019
  • 1
  • 23
  • 34
  • Is the `|` to `sort` using a built-in to `awk`? If not, any idea why `-V` - `version sort` - would not work here? Also, if I choose not to use `-t` option, it seems that to select the third column, I need to use `-k4` - odd indeed! – AnthonyK Jan 17 '21 at 11:30
4
awk -F "," '{print $0}' user.csv | sort -nk3 -t ','

This should work

Francesco
  • 897
  • 8
  • 22
3

To exclude the first line (header) from sorting, I split it out into two buffers.

df | awk 'BEGIN{header=""; $body=""} { if(NR==1){header=$0}else{body=body"\n"$0}} END{print header; print body|"sort -nk3"}'
rupert160
  • 1,441
  • 1
  • 17
  • 19
1

With GNU awk:

awk -F ',' '{ a[$3]=$0 } END{ PROCINFO["sorted_in"]="@ind_str_asc"; for(i in a) print a[i] }' file

See 8.1.6 Using Predefined Array Scanning Orders with gawk for more sorting algorithms.

Cyrus
  • 84,225
  • 14
  • 89
  • 153
0

I'm running Linux (Ubuntu) with mawk:

tmp$ awk -W version
mawk 1.3.4 20200120
Copyright 2008-2019,2020, Thomas E. Dickey
Copyright 1991-1996,2014, Michael D. Brennan

random-funcs:       srandom/random
regex-funcs:        internal
compiled limits:
sprintf buffer      8192
maximum-integer     2147483647

mawk (and gawk) has an option to redirect the output of print to a command. From man awk chapter 9. Input and output:

The output of print and printf can be redirected to a file or command by appending > file, >> file or | command to the end of the print statement. Redirection opens file or command only once, subsequent redirections append to the already open stream.

Below you'll find a simplied example how | can be used to pass the wanted records to an external program that makes the hard work. This also nicely encapsulates everything in a single awk file and reduces the command line clutter:

tmp$ cat input.csv 
alpha,num
D,4
B,2
A,1
E,5
F,10
C,3
tmp$ cat sort.awk 
# print header line
/^alpha,num/ {
    print
}

# all other lines are data lines that should be sorted
!/^alpha,num/ {
    print | "sort --field-separator=, --key=2 --numeric-sort"
}
tmp$ awk -f sort.awk input.csv 
alpha,num
A,1
B,2
C,3
D,4
E,5
F,10

See man sort for the details of the sort options:

-t, --field-separator=SEP
       use SEP instead of non-blank to blank transition
-k, --key=KEYDEF
       sort via a key; KEYDEF gives location and type
-n, --numeric-sort
       compare according to string numerical value
user272735
  • 10,473
  • 9
  • 65
  • 96
  • One can correctly argue this answer has no new information. However I found most of the other answers unnecessary terse and I had to dig out the magic of `|` from the mighty manual. So I wrote a note to myself for the next time :) – user272735 Sep 29 '22 at 09:09