1

Do we already have a function similar to np.add in awkward arrays?

I am in a situation i need to add them, and "+" operator work fine for simple array but not for nested array.

e.g. >>> ak.to_list(c1)

[[], [], [], [], [0.944607075944902]]

>>> ak.to_list(c2)

[[0.9800207661211596], [], [], [], []]

>>> c1+c2

Traceback (most recent call last): File "", line 1, in File "/afs/cern.ch/work/k/khurana/EXOANALYSIS/CMSSW_11_0_2/src/bbDMNanoAOD/analyzer/dependencies/lib/python3.6/site-packages/numpy/lib/mixins.py", line 21, in func return ufunc(self, other) File "/afs/cern.ch/work/k/khurana/EXOANALYSIS/CMSSW_11_0_2/src/bbDMNanoAOD/analyzer/dependencies/lib/python3.6/site-packages/awkward1/highlevel.py", line 1380, in array_ufunc return awkward1._connect._numpy.array_ufunc(ufunc, method, inputs, kwargs) File "/afs/cern.ch/work/k/khurana/EXOANALYSIS/CMSSW_11_0_2/src/bbDMNanoAOD/analyzer/dependencies/lib/python3.6/site-packages/awkward1/_connect/_numpy.py", line 107, in array_ufunc out = awkward1._util.broadcast_and_apply(inputs, getfunction, behavior) File "/afs/cern.ch/work/k/khurana/EXOANALYSIS/CMSSW_11_0_2/src/bbDMNanoAOD/analyzer/dependencies/lib/python3.6/site-packages/awkward1/_util.py", line 972, in broadcast_and_apply out = apply(broadcast_pack(inputs, isscalar), 0) File "/afs/cern.ch/work/k/khurana/EXOANALYSIS/CMSSW_11_0_2/src/bbDMNanoAOD/analyzer/dependencies/lib/python3.6/site-packages/awkward1/_util.py", line 745, in apply outcontent = apply(nextinputs, depth + 1) File "/afs/cern.ch/work/k/khurana/EXOANALYSIS/CMSSW_11_0_2/src/bbDMNanoAOD/analyzer/dependencies/lib/python3.6/site-packages/awkward1/_util.py", line 786, in apply nextinputs.append(x.broadcast_tooffsets64(offsets).content) ValueError: in ListOffsetArray64, cannot broadcast nested list

(https://github.com/scikit-hep/awkward-1.0/blob/0.3.1/src/cpu-kernels/operations.cpp#L778)

only way I can add them is using the firsts and then replacing the None with 0.

>>> z1=ak.fill_none(ak.firsts(c1),0.)

>>> z2=ak.fill_none(ak.firsts(c2),0.)

>>> z1

<Array [0, 0, 0, 0, 0.945] type='5 * float64'>

>>> z2

<Array [0.98, 0, 0, 0, 0] type='5 * float64'>

>>> z1+z2

<Array [0.98, 0, 0, 0, 0.945] type='5 * float64'>

Can something similar to np.add be devised for ak even if with limited scope/functionality. By limited scope I meant if it can work only on the same dimension ak array then it would serve my present purpose at least.

Thanks.

Raman Khurana
  • 125
  • 1
  • 7
  • Sorry—I need to do something about my StackOverflow alerts. I'm supposed to get emails if something is tagged with `[awkward-array]` and I wasn't. I'll check up on that. – Jim Pivarski Oct 19 '20 at 23:43

1 Answers1

1

The exception that you saw for

>>> ak.to_list(c1)
[[], [], [], [], [0.944607075944902]]

>>> ak.to_list(c2)
[[0.9800207661211596], [], [], [], []]

>>> c1+c2

is correct: you can't add these two arrays. It's not because Awkward lacks an ak.add function. Such a thing would be identical to np.add:

>>> c1 + c2          # this actually calls np.add
<Array [[], [], [], [], [1.89]] type='5 * var * float64'>
>>> np.add(c1, c1)
<Array [[], [], [], [], [1.89]] type='5 * var * float64'>

It doesn't work because the arrays have a different number of elements at each position. It's like trying to add two NumPy arrays with different shapes. (You can add NumPy arrays with certain different shapes, just as you can add Awkward arrays with certain different shapes, if they broadcast. These don't.)

If you want an empty list to behave like a list with a zero in it, then you did the right thing: ak.firsts and ak.singletons convert between two ways of representing missing data:

  • as None vs another value
  • as empty lists vs the value in a length-1 list.

In some languages, a missing or potentially missing value is treated as a length-0 or length-1 list, such as Scala's Option type. Thus,

>>> ak.firsts(c1)
<Array [None, None, None, None, 0.945] type='5 * ?float64'>

presumes that you were starting from empty-or-singleton (appears to be true in your examples) and converts it to an option-type array with one level less depth. Then doing an ak.fill_none means that you wanted these missing values (which came from empty lists) to act like zeros for addition, and you got what you wanted.

>>> ak.fill_none(ak.firsts(c1), 0) + ak.fill_none(ak.firsts(c2), 0)
<Array [0.98, 0, 0, 0, 0.945] type='5 * float64'>

One thing that's not clear from your data is whether you always expect the lists to have at most one item—ak.firsts will only pull the first item out of each list. If you had

>>> c1 = ak.Array([[], [], [], [], [0.999, 0.123]])
>>> c2 = ak.Array([[0.98], [], [], [], []])

then

>>> ak.fill_none(ak.firsts(c1), 0) + ak.fill_none(ak.firsts(c2), 0)
<Array [0.98, 0, 0, 0, 0.999] type='5 * float64'>

might not be what you want, since it drops the 0.123. You might actually want to ak.pad_none each list to have at least one element, like this:

>>> ak.pad_none(c1, 1)
<Array [[None], [None], ... [0.999, 0.123]] type='5 * var * ?float64'>
>>> ak.fill_none(ak.pad_none(c1, 1), 0)
<Array [[0], [0], [0], [0], [0.999, 0.123]] type='5 * var * float64'>

This maintains the structure, distinguishing between list lengths for all lengths except for 0 and 1, because empty lists have been converted into [0]. You can't use this for adding unless these longer lists match lengths (back to your original problem), but you can arrange for that, too.

>>> ak.fill_none(ak.pad_none(c1, 2), 0) + ak.fill_none(ak.pad_none(c2, 2), 0)
<Array [[0.98, 0], [0, ... 0], [0.999, 0.123]] type='5 * var * float64'>

It all depends on what structures you have and what structures you want. It wouldn't be a good idea to create a new function that does one of the two things above, especially if it has a name that's dangerously close to a NumPy function's, like np.add, because it works in a different way that would have to be explained for anyone to safely use it. If you want to do a specialized thing, it's safer to have you build it out of simpler primitives (even if you wrap it up as a convenience function in your own work), because then you know what rules it follows.

Jim Pivarski
  • 5,568
  • 2
  • 35
  • 47
  • Thanks a lot Jim for long and detailed explanation. I kept the simplest example to explain the problem, and I indeed have complicated structure of arrays like you mentioned for pad_none example. I think pad_none will be more general solution for me, but it kind of disturbs the structure of original array. At the moment I can't think if it can be of any harm to me and seems like a valid and generic solution for this problem. – Raman Khurana Oct 20 '20 at 11:07
  • The original problem was, I have two arrays of similar structure like here c1 and c2. A few element in c1 and c2 will have some values and they are orthogonal, i.e. if c1 has non-none/non-zero entry then c2 will not have one and vice versa, and then there are some entries when both c1 and c2 will have none, aim is to get one single array by combining these two arrays and I thought something similar to add would be a good solution, not sure if something else can serve the purpose, because I don't really need to add them, i just need to combine them/join them in some way which I don't know – Raman Khurana Oct 20 '20 at 11:11