How to substitute two NumPy subarrays with a single one

Question

Is there a simple solution to substitute two NumPy subarrays with a single one while the entires are a result of functions being called on the entries of the two former arrays.

For example:

[1, a, b][1, c, d] -> [1, f_add, f_sub]

with:

f_add(a, b, c, d):
    return a + b + c + d

f_sub(b, d):
    return b-d

Concrete:

 [1, 0, 50], [1, 3, 1] -> [1, 54, 49]

As an addition, the row [1, 0, 50], [1, 3, 1] is part of a bigger array (1st row in example), so it should be substituted in-place

([[1, 0, 50], [2, 0, 50], [1, 3.0, 1.0]],
 [[1, 0, 50], [2, 0, 50], [2, 3.0, 1.0]])


([[1, 54, 49], [2, 0, 50]],
 [[1, 0, 50], [2, 0, 50], [2, 3.0, 1.0]])

Thanks!

EDIT:

The functions f_add and f_sub are just examples to illustrate what I want to do and that the change in entries is a result of functions being called. In reality I use slightly more complex functions that carry out a (more) meaningful computation.

The second thing is that this substitution should only be carried out on elements where the first entry is the same. So in the first row only [1, 0. 50.] and [1, 3.0, 1.0] merge, while in the second it would be [2, 0., 50.] and [2, 30, 1.0).

In this example, I first wanted to separate the issue of determining which sub-arrays are intended to merge by comparing the indices, but I guess it should be included to be as general as possible.

A more complete example result to the one above would be as follows then:

 ([[1, 0., 50.], [2, 0., 50], [1, 3.0, 1.0]],
 [[1, 0., 50.], [2, 0., 50.], [2, 3.0, 1.0]])

leading to:

([[1, 54., 49.], [2, 0., 50.]],
 [[1, 0., 50.], [2, 54., 49.]])

if you add your current code would be great! – melix Sep 08 '16 at 11:52 — melix, Sep 08 '16 at 11:52

Stephen B · Answer 1 · 2016-09-08T13:10:32.417

0

You can use a generator expression to get this result (assuming there are three subelements for each element of the array):

ar = ([[1, 0, 50], [2, 0, 50], [1, 3.0, 1.0]],
      [[1, 0, 50], [2, 0, 50], [2, 3.0, 1.0]])
ar = tuple([[[x[0][0], sum(x[0][1:]) + sum(x[-1][1:]), x[0][-1]-x[-1][-1]], x[1]] for x in ar])
print ar

([[1, 54.0, 49.0], [2, 0, 50]], [[1, 54.0, 49.0], [2, 0, 50]])

EDIT: Perhaps for a more general solution you can define a function f(x) that performs the desired calculation to elements of an array, and map that function to every row of the array. For instance,

def f(x):
    if (x[0][0] == x[1][0]):
        return [[x[0][0], x[0][1]+x[0][2]+x[1][1]+x[1][2], x[0][2]-x[1][2]], x[2]]
    elif (x[0][0] == x[2][0]):
        return [[x[0][0], x[0][1]+x[0][2]+x[2][1]+x[2][2], x[0][2]-x[2][2]], x[1]]
    elif (x[1][0] == x[2][0]):
        return [x[0], [x[1][0], x[1][1]+x[1][2]+x[2][1]+x[2][2], x[1][2]-x[2][2]]]
    else:
        return x

ar = ([[1, 0, 50], [2, 0, 50], [1, 3.0, 1.0]],
      [[1, 0, 50], [2, 0, 50], [2, 3.0, 1.0]])

print tuple(map(f, ar))

([[1, 54.0, 49.0], [2, 0, 50]], [[1, 0, 50], [2, 54.0, 49.0]])

edited Sep 08 '16 at 13:10

answered Sep 08 '16 at 12:30

Stephen B

1,246
1
10
23

I should probably add that the functions f_add and f_sub are just examples to illustrate what i want to do, In reality i use slightly more complex functions that carry out a meaningful computation. The second thing is that this substitution should only be carried out on elements where the first entry is the same. So in the first row only [1, 0. 50.] and [2, 0, 50.] merge while in the second it would be [2, 0., 50.] and [2, 30., 1.0). – vare Sep 08 '16 at 12:35
Ah, I see. I misinterpreted the problem. Perhaps you can just map a function to every row using `map(f, ar)` and implement the logic within the function? Although there may be a better way to accomplish that (of which I am unaware). – Stephen B Sep 08 '16 at 12:56
this looks quite elegant. I guess to make the function more generic for arrays where I a-priori don't know the length and therefore cant use the hardcoded indicies I should use a numpy mask or the like? – vare Sep 08 '16 at 14:33

score 0 · Answer 2 · answered Sep 08 '16 at 13:28

It sounds like things could get quite complicated if you have lots of functions working on the array. I would consider breaking each row into a class to manage the function calls a bit more succinctly. For example you could contain all of the relevant functions within the class:

class Row:
    def __init__(self, row):
        self.row = row

        self.sum1 = None
        self.sub1 = None

        self._add(row)
        self._sub(row)

    def _add(self, items):
        self.sum1 = sum([items[0][1], items[0][2], items[-1][1], items[-1][2]])

    def _sub(self, items):
        self.sub1 = items[0][2] - items[-1][2]

    def update(self):
        self.row = [[self.row[0][0], self.sum1, self.sub1], self.row[1]]

# Data
arr = ([[1, 0, 50], [2, 0, 50], [1, 3.0, 1.0]],
 [[1, 0, 50], [2, 0, 50], [2, 3.0, 1.0]])

# Usage
for row in arr:
    r = Row(row)
    print r.sum1, r.sub1

    r.update()
    print r.row



>>> 54.0 49.0
    [[1, 54.0, 49.0], [2, 0, 50]]
    54.0 49.0
    [[1, 54.0, 49.0], [2, 0, 50]]  # This row doesnt match your example, but you get the idea

hpaulj · Answer 3 · 2016-09-08T20:37:37.047

Here's a function to perform the first (inner most) step, assuming the 2 inputs are lists:

def merge(a,b):
    res = [a[0]]  # test b[0] is same?
    abcd = a[1:]+b[1:]   # list join  
    # use np.concatenate here is a,b are arrays
    res.append(f_add(*abcd))
    res.append(f_sum(a[2],b[2]))
    return res
def f_add(a,b,c,d):
    return a+b+c+d
def f_sum(b,d):
    return b-d

In [484]: merge([1,0,50],[1,3,1])
Out[484]: [1, 54, 49]

With this mixed use of elements, and general functions there isn't much point to treating these as arrays.

Then write a function to handle a 'row', the list of lists where lists with the same x[0] id are to be merged. The easiest way to collect matching pairs (are there only pairs?) is with a defaultdict.

So I find the pairs; and the merge them with the above function.

def subs(alist):
    # collect the matching ids
    from collections import defaultdict
    dd = defaultdict(list)
    for i,x in enumerate(alist):
        dd[x[0]].append(x)
    # merge pairs
    for i in dd.keys():
        if len(dd[i])==2:
           dd[i]=merge(dd[i][0],dd[i][1])
        elif len(dd[i])==1:
           dd[i]=dd[i][0]  # flatten
        else:
           pass  # do nothing with triplets etc.
    return list(dd.values())

In [512]: lll= [[[1, 0, 50], [2, 0, 50], [1, 3.0, 1.0]],
     ...:  [[1, 0, 50], [2, 0, 50], [2, 3.0, 1.0]]]

In [513]: [subs(l) for l in lll]
Out[513]: [[[1, 54.0, 49.0], [2, 0, 50]], 
           [[1, 0, 50], [2, 54.0, 49.0]]]

The lll could be turned into a 3d array:

In [523]: arr=np.array(lll)
In [524]: arr
Out[524]: 
array([[[  1.,   0.,  50.],
        [  2.,   0.,  50.],
        [  1.,   3.,   1.]],

       [[  1.,   0.,  50.],
        [  2.,   0.,  50.],
        [  2.,   3.,   1.]]])

and the ids we want to mix and match are:

In [525]: arr[:,:,0]
Out[525]: 
array([[ 1.,  2.,  1.],
       [ 1.,  2.,  2.]])

A pair to be merged is

In [526]: arr[0,[0,2],:]
Out[526]: 
array([[  1.,   0.,  50.],
       [  1.,   3.,   1.]])

and the 2 mergers:

In [527]: merge(*arr[0,[0,2],:].tolist())
Out[527]: [1.0, 54.0, 49.0]
In [528]: merge(*arr[1,[1,2],:].tolist())
Out[528]: [2.0, 54.0, 49.0]

But identifying these pairs, and performing the mergers and building a new array is no easier with arrays than it was with the lists.

In [532]: np.array([subs(l.tolist()) for l in arr])
Out[532]: 
array([[[  1.,  54.,  49.],
        [  2.,   0.,  50.]],

       [[  1.,   0.,  50.],
        [  2.,  54.,  49.]]])

How to substitute two NumPy subarrays with a single one

3 Answers3