numpy structured array sorting by multiple columns

Question

A minimal numpy structured array generator:

import numpy as np

index = np.arange(4)
A = np.stack((np.sin(index), np.cos(index)),axis=1)
B = np.eye(4).astype(int)
C = np.array([1, 0, 1, 0], dtype=bool)
goodies = [(a, b, c, d) for a, b, c, d in zip(index, A, B, C)]
dt = [('index', 'int'), ('two_floats', 'float', 2), 
      ('four_ints', 'int', 4), ('and_a_bool', 'bool')]
s = np.array(goodies, dtype=dt)

generates the minimal numpy structured array:

array([(0, [ 0.        ,  1.        ], [1, 0, 0, 0],  True),
       (1, [ 0.84147098,  0.54030231], [0, 1, 0, 0], False),
       (2, [ 0.90929743, -0.41614684], [0, 0, 1, 0],  True),
       (3, [ 0.14112001, -0.9899925 ], [0, 0, 0, 1], False)],
      dtype=[('index', '<i8'), ('two_floats', '<f8', (2,)), ('four_ints', '<i8', (4,)), ('and_a_bool', '?')])

I want to sort first by and_a_bool descending, then by the second column of two_floats ascending so that the output would then be

array([(2, [ 0.90929743, -0.41614684], [0, 0, 1, 0],  True),
       (0, [ 0.        ,  1.        ], [1, 0, 0, 0],  True),
       (3, [ 0.14112001, -0.9899925 ], [0, 0, 0, 1], False),
       (1, [ 0.84147098,  0.54030231], [0, 1, 0, 0], False)],
      dtype=[('index', '<i8'), ('two_floats', '<f8', (2,)), ('four_ints', '<i8', (4,)), ('and_a_bool', '?')])

np.lexsort was mentioned in this answer but I don't see how to apply that here.

I'm looking for something using existing numpy methods rather than specialized code. My arrays will not be very large so I don't have a strong preference for in-place sorting or generating a new array,

`np.sort` takes an `order` parameter that lets you specify which fields to sort, and in what order (in effect a refinement on `lexsort`). To get descending sort, make a new array with negated fields, and use `np.argsort` to get the desired sort order. — hpaulj, May 20 '20 at 07:07
@hpaulj hopefully an answer can be written based on that. I'll take a look as well, thank you! — uhoh, May 20 '20 at 07:43
@hpaulj I could not make that work, can you consider posting a short answer? Thanks! — uhoh, May 21 '20 at 06:08

score 1 · Answer 1 · answered May 21 '20 at 06:26

Make a temp sorting array:

In [133]: temp=np.zeros(s.shape, dtype='bool,float')                                     
In [134]: temp['f0']=~s['and_a_bool']                                                    
In [135]: temp['f1']=s['two_floats'][:,1]                                                
In [136]: temp                                                                           
Out[136]: 
array([(False,  1.        ), ( True,  0.54030231), (False, -0.41614684),
       ( True, -0.9899925 )], dtype=[('f0', '?'), ('f1', '<f8')])

now argsort (don't need to specify order since I choose the temp fields in the desired order):

In [137]: np.argsort(temp)                                                               
Out[137]: array([2, 0, 3, 1])

and apply that sort to s:

In [138]: s[_137]                                                                        
Out[138]: 
array([(2, [ 0.90929743, -0.41614684], [0, 0, 1, 0],  True),
       (0, [ 0.        ,  1.        ], [1, 0, 0, 0],  True),
       (3, [ 0.14112001, -0.9899925 ], [0, 0, 0, 1], False),
       (1, [ 0.84147098,  0.54030231], [0, 1, 0, 0], False)],
      dtype=[('index', '<i8'), ('two_floats', '<f8', (2,)), ('four_ints', '<i8', (4,)), ('and_a_bool', '?')])

Okay this works nicely, though I don't understand exactly how right now, your insight into numpy is always astounding! Just the first line already makes no sense to me; when printing `temp` the data does not match the dtype but that seems to be what is making this work. Thank you! — uhoh, May 21 '20 at 06:43

numpy structured array sorting by multiple columns

1 Answers1

Linked