4

To me, this sounds like a common use-case, but I couldn't find the proper function/thread for it, yet.

I have two numpy arrays, one is a sequence of triplets and the other the associated sequence of indices. I want to create a 1-dim array of equal sequence length, composed of the mapping items according to their index.

Example:

mapping = np.array(((25, 120, 240), (18, 177, 240), (0, 0, 0), (10, 120, 285)))
indices = np.array((0, 1, 0, 0))

print "mapping:", mapping
print "indices:", indices
print "mapped:", mapping[indices]

Which produces the following output:

mapping: [[ 25 120 240]
 [ 18 177 240]
 [  0   0   0]
 [  10 120 285]]
indices: [0 1 0 0]
mapped: [[ 25 120 240]
 [ 18 177 240]
 [ 25 120 240]
 [ 25 120 240]]

Of course, this approach takes the whole mapping array as one mapping, not as a list of mappings, returning only the 1st or 2nd inner mapping, according to the indices array. But what I was looking for is this:

mapped: [25 177 0 10]

... which is made from the 1st item of the 1st mapping, the 2nd of the 2nd mapping and the first of the 3rd and 4th mapping.

Is there a lean way to do it with numpy functionality alone, without external looping and without excess of memory usage for temporary arrays?

Michael S.
  • 150
  • 1
  • 9
  • there is a typo in the latest row of the `mapping` array: 0 → 10 – nicoco Feb 19 '18 at 08:23
  • Correct, thank you. I had later modified the mapping array example to contain fewer ambiguous values, but missed to adapt it in the definition of the post, sorry. Updated. – Michael S. Feb 19 '18 at 09:19
  • Why are you worried about `temporary arrays`? That's not what we should focus on when using `numpy`. Let the interpreter deal with those. – hpaulj Feb 19 '18 at 17:58
  • Unfortunately, I am not in the noble position to have unlimited memory and I need what I have also for other programs and variables than temporary bulk arrays that grow exponentially with their length. In some tests, already for medium sized images my (ineffizient) temp arrays became too large. In mid and longterm, I aim at good performance, which is also impacted by unnecessary large temporary arrays. – Michael S. Feb 19 '18 at 22:04

1 Answers1

2

I think you are looking for this part of numpy's documentation on indexing.

In [17]: mapping[(np.arange(indices.shape[-1]),indices)]
Out[17]: array([ 25, 177,   0,   10])

This create a temporary array (np.arange) but it is 1-dimensional and I couldn't think of anything better.

nicoco
  • 1,421
  • 9
  • 30
  • 1
    Made some minor edits. Feel free to revert if unhappy. – Paul Panzer Feb 19 '18 at 08:42
  • Great! Exactly what I was looking for. The overhead is acceptable. But something like a broadcasted, length squared temp array would be too much. – Michael S. Feb 19 '18 at 09:44
  • Did so, was just looking for a text link like “correct answer“, not expecting the hook. – Michael S. Feb 19 '18 at 09:47
  • Thank you for updating nicocos answer. For me, his original answer worked very well, where the temp array is based on the sequence length of the mapping array. How does the usage of indices shape improve the code? And what is meant by “advanced indices must have consistent shapes“. Thx – Michael S. Feb 19 '18 at 11:00
  • Why do you describe that `arange` as useless? It is required for this type of `advanced indexing`. – hpaulj Feb 19 '18 at 17:57
  • Useless is probably not the right word, but OP was worried about memory usage. I'll change my answer. – nicoco Feb 19 '18 at 18:29