40

Is there a better way to get the "output_array" from the "input_array" and "select_id" ?

Can we get rid of range( input_array.shape[0] ) ?

>>> input_array = numpy.array( [ [3,14], [12, 5], [75, 50] ] )
>>> select_id = [0, 1, 1]
>>> print input_array
[[ 3 14]
 [12  5]
 [75 50]]

>>> output_array = input_array[  range( input_array.shape[0] ), select_id ]
>>> print output_array
[ 3  5 50]
Alex Riley
  • 169,130
  • 45
  • 262
  • 238
Bystander
  • 435
  • 1
  • 4
  • 7
  • 1
    It's a sick way of doing it, and definitely not better than what you have, but `np.diagonal(input_array[:, select_id])` will also get you `array([ 3, 5, 50])`. – Jaime Jun 12 '13 at 20:58
  • 1
    Aside from using `arange` instead of `range`, the advanced indexing solution in the question is already the best option. – user2357112 Nov 15 '19 at 06:19

4 Answers4

38

You can choose from given array using numpy.choose which constructs an array from an index array (in your case select_id) and a set of arrays (in your case input_array) to choose from. However you may first need to transpose input_array to match dimensions. The following shows a small example:

In [101]: input_array
Out[101]: 
array([[ 3, 14],
       [12,  5],
       [75, 50]])

In [102]: input_array.shape
Out[102]: (3, 2)

In [103]: select_id
Out[103]: [0, 1, 1]

In [104]: output_array = np.choose(select_id, input_array.T)

In [105]: output_array
Out[105]: array([ 3,  5, 50])
mg007
  • 2,888
  • 24
  • 29
  • 1
    instead of outputting those values, how do we modify them in place? – syllogismos Feb 21 '17 at 11:11
  • You can use this http://stackoverflow.com/questions/7761393/how-to-modify-a-2d-numpy-array-at-specific-locations-without-a-loop – Steven Mar 25 '17 at 17:44
  • 10
    Have to comment this does not work for large arrays, as pointed out below by Nathan. It gives "ValueError: Need at least 1 and at most 32 array objects." Anyone know the reason this method is limited to small arrays? – Tony Apr 17 '18 at 18:17
9

(because I can't post this as a comment on the accepted answer)

Note that numpy.choose only works if you have 32 or fewer choices (in this case, the dimension of your array along which you're indexing must be of size 32 or smaller). Additionally, the documentation for numpy.choose says

To reduce the chance of misinterpretation, even though the following "abuse" is nominally supported, choices should neither be, nor be thought of as, a single array, i.e., the outermost sequence-like container should be either a list or a tuple.

The OP asks:

  1. Is there a better way to get the output_array from the input_array and select_id?
    • I would say, the way you originally suggested seems the best out of those presented here. It is easy to understand, scales to large arrays, and is efficient.
  2. Can we get rid of range(input_array.shape[0])?
    • Yes, as shown by other answers, but the accepted one doesn't work in general so well as what the OP already suggests doing.
Nathan
  • 9,651
  • 4
  • 45
  • 65
2

I think enumerate is handy.

[input_array[enum, item] for enum, item in enumerate(select_id)]
y4suyuki
  • 361
  • 3
  • 7
  • 3
    time-saving by in-line loop is always nice. i really need numpy for processing a lot of data though... – Bystander Jun 13 '13 at 19:03
0

How about:

[input_array[x,y] for x,y in zip(range(len(input_array[:,0])),select_id)]
Lee
  • 29,398
  • 28
  • 117
  • 170