Nested array computations in Python using numpy

Question

I am trying to use numpy in Python in solving my project.

I have a random binary array rndm = [1, 0, 1, 1] and a resource_arr = [[2, 3], 4, 2, [1, 2]]. What I am trying to do is to multiply the array element wise, then get their sum. As an expected output for the sample above, output = 5 0 2 3. I find hard to solve such problem because of the nested array/list.

So far my code looks like this:

   def fitness_score():

   output = numpy.add(rndm * resource_arr)
   return output


   fitness_score()

I keep getting

ValueError: invalid number of arguments.

For which I think is because of the addition that I am trying to do. Any help would be appreciated. Thank you!

The error is caused by numpy.add() needing at least two input arguments: https://numpy.org/doc/stable/reference/generated/numpy.add.html — Steven, Jun 01 '20 at 11:54
What you show is 2 lists, and one list has a mix of scalars and lists. List multiplications won't work, and even when converted to arrays, the 2nd is an object array with the same mix. I'd suggest reexamining the source for `resource_arr`; it's a messy object to do math on. — hpaulj, Jun 01 '20 at 14:40
Hello! Thanks for your suggestion. I figured my fault in coding. The comments below help me realize that. — Acee, Jun 01 '20 at 14:55

score 1 · Answer 1 · answered Jun 01 '20 at 12:06

Numpy is all about the non-jagged arrays. You can do things with jagged arrays, but doing so efficiently and elegantly isnt trivial.

Almost always, trying to find a way to map your datastructure to a non-nested one, for instance, encoding the information as below, will be more flexible, and more performant.

resource_arr = (
    [0, 0, 1, 2, 3, 3]
    [2, 3, 4, 2, 1, 2]
)

That is, an integer denoting the 'row' each value belongs to, paired with an array of equal size of the values themselves.

This may 'feel' wasteful when coming from a C-style way of doing arrays (omg more memory consumption), but staying away from nested datastructures is almost certainly your best bet in terms of performance, and the amount of numpy/scipy ecosystem that will actually be compatible with your data representation. If it really uses more memory is actually rather questionable; every new python object uses a ton of bytes, so if you have only few elements per nesting, it is the more memory efficient solution too.

In this case, that would give you the following efficient solution to your problem:

output = np.bincount(*resource_arr) * rndm

Steven · Accepted Answer · 2021-07-13T14:51:58.380

1

Numpy treats its arrays as matrices, and resource_arr is not a (valid) matrix. In your case a python list is more suitable:

def sum_nested(l):
    tmp = []

    for element in l:
        if isinstance(element, list):
            tmp.append(numpy.sum(element))
        else:
            tmp.append(element)

    return tmp

In this function we check for each element inside l if it is a list. If so, we sum its elements. On the other hand, if the encountered element is just a number, we leave it untouched. Please note that this only works for one level of nesting.

Now, if we run sum_nested([[2, 3], 4, 2, [1, 2]]) we will get [5 4 2 3]. All that's left is multiplying this result by the elements of rndm, which can be achieved easily using numpy:

def fitness_score(a, b):
    return numpy.multiply(a, sum_nested(b))

edited Jul 13 '21 at 14:51

answered Jun 01 '20 at 12:11

Steven

1,123
5
14
31

hello! the code seems to work, but why is it I am getting something like this for an answer [[3 2] [3 2]]. The values of rndm = [[1], [1]] while the sum_nested is [3,2]. It seems that depending on the number of elements in rndm, that's how many answer I will be getting – Acee Jun 01 '20 at 12:41
I get where the problem is with my code. It seems like every element in rndm multiplies itself with sum_nested. That is why I am getting an answer like that. Is there a way I could prevent this from happening? – Acee Jun 01 '20 at 12:44
I figured out my code! Seems like a format fault. Thank you so much for your help! – Acee Jun 01 '20 at 12:50

score 0 · Answer 3 · answered Jun 01 '20 at 11:23

0

I have not worked much with pandas/numpy so I'm not sure if this is most efficient way, but it works (atleast for the example you have shown):

import numpy as np
rndm = [1, 0, 1, 1]
resource_arr = [[2, 3], 4, 2, [1, 2]]

multiplied_output = np.multiply(rndm, resource_arr)
print(multiplied_output)

output = []
for elem in multiplied_output:
  output.append(sum(elem)) if isinstance(elem, list) else output.append(elem)

final_output = np.array(output)
print(final_output)

answered Jun 01 '20 at 11:23

Sowjanya R Bhat

1,128
10
19

Hello, thanks for your comment. Although when I tried using your code, it still won't add itself. I still get the same output. – Acee Jun 01 '20 at 11:36
rndm is an array that changes every run I make while resource_arr is also an array. I got the correct answer when I multiply the two arrays, but when I try to find the output I am looking for, I always get the same answer as the one when I multiplied them. I edited my post just so you will know my code. – Acee Jun 01 '20 at 11:43
I figured out my code! Thank you so much for you help! – Acee Jun 01 '20 at 12:51

Nested array computations in Python using numpy

3 Answers3