16

I have a file, lets say "bigfile", with tabular data of the following form,

a1 b2 a3 1
b1 a2 c3 0
... and so on.

I want to use the built-in "sort" program on my Linux machine so sort this file by the fourth field(numeric) and then by the first field at the same time. I went through the man pages a couple of times and all I could come up with was,

sort -n -k4,4 -k1,1 bigfile

Is there a way to make "sort" do what I want or I have to write my own custom program?

Thank you.

wnoise
  • 9,764
  • 37
  • 47
Vijay
  • 481
  • 3
  • 5
  • 13
  • 3
    @Orbit, I believe `-k4` merely *starts* a key at column 4. But the end of the key is not specified and therefore the key goes all the way to the end. So, `-k4 -k1`, is really something more like `-k4 -k5 -k6 -k7 -k1`, and therefore the `-k1` is kinda meaningless. (Yes, it's really counterintuitive, but basically you should always do `-kX,X` for every field – Aaron McDaid Feb 06 '15 at 14:17
  • @AaronMcDaid - Ah, appreciate the response. Thanks kindly! – Brandon Frohbieter Feb 06 '15 at 22:34

1 Answers1

30

From the manpage:

POS is F[.C][OPTS], where F is the field number and C the character position in the field; both are origin 1. If neither -t nor -b is in effect, characters in a field are counted from the beginning of the preceding whitespace. OPTS is one or more single-letter ordering options, which override global ordering options for that key. If no key is given, use the entire line as the key.

sort -k4,4n -k1,1 bigfile ought to do it.

Another option would be sort -k1,1 bigfile | sort --stable -n -k4,4 The stable sort means that ties on the 4th field are resolved by the initial position, which is set by the first pass of sort to be first field.

wnoise
  • 9,764
  • 37
  • 47