There isn't a "high-level" way to do this, a way that is independent of knowledge of the array's layout, but I can walk you through this.
Awkward 0.x (obsolete)
Assuming that you have a simple jagged array,
>>> import awkward0
>>> import numpy as np
>>> array = awkward0.fromiter([[1.1, 2.2, 3.3], [], [4.4, 5.5]])
>>> array.layout
layout
[ ()] JaggedArray(starts=layout[0], stops=layout[1], content=layout[2])
[ 0] ndarray(shape=3, dtype=dtype('int64'))
[ 1] ndarray(shape=3, dtype=dtype('int64'))
[ 2] ndarray(shape=5, dtype=dtype('float64'))
You can apply the cumulative sum to the content
:
>>> np.cumsum(array.content)
array([ 1.1, 3.3, 6.6, 11. , 16.5])
and wrap that up as a new jagged array:
>>> scan = awkward0.JaggedArray.fromoffsets(array.offsets, np.cumsum(array.content))
>>> scan
<JaggedArray [[1.1 3.3000000000000003 6.6] [] [11.0 16.5]] at 0x7f0621a826a0>
Awkward 1.x
The offsets
and content
structure that we directly manipulated in Awkward 0.x are now hidden in a "layout" to distinguish between high-level operations (which don't require knowledge of the exact layout) and low-level operations (which do). This problem doesn't have a high-level solution, and the low-level way is like the above, but it involves extra wrapping and unwrapping.
>>> import awkward as ak
>>> import numpy as np
>>> array = ak.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]])
>>> array
<Array [[1.1, 2.2, 3.3], [], [4.4, 5.5]] type='3 * var * float64'>
>>> layout = array.layout
>>> layout
<ListOffsetArray64>
<offsets><Index64 i="[0 3 3 5]" offset="0" length="4" at="0x55737ef6f880"/></offsets>
<content><NumpyArray format="d" shape="5" data="1.1 2.2 3.3 4.4 5.5" at="0x55737ef71890"/></content>
</ListOffsetArray64>
As before, you can do a cumulative sum on the content
:
>>> np.cumsum(layout.content)
array([ 1.1, 3.3, 6.6, 11. , 16.5])
Here's the structure of how it gets wrapped up:
>>> scan = ak.Array(
... ak.layout.ListOffsetArray64(
... layout.offsets,
... ak.layout.NumpyArray(
... np.cumsum(layout.content)
... )
... )
... )
...
>>> scan
<Array [[1.1, 3.3, 6.6], [], [11, 16.5]] type='3 * var * float64'>
What if you want the scan per-list?
If you want a solution similar to Frank Yellin's, in which each scan starts new in each list, the fact that we did one np.cumsum
on the content
is a problem. In concrete terms, we have the third list starting with 11
, instead of 4.4
.
A vectorized way to do that is to subtract the first scan
element of each list from the whole list and add the first array
element back in. In both Awkward 0.x and 1.x, this can be done with slices like array[:, 0]
and broadcasting, but empty lists (if you have them) are going to be a problem. Awkward 1.x has enough alternatives to work around that:
>>> ak.firsts(scan)
<Array [1.1, None, 11] type='3 * ?float64'>
>>> scan - ak.firsts(scan)
<Array [[0, 2.2, 5.5], None, [0, 5.5]] type='3 * option[var * float64]'>
>>> scan - ak.firsts(scan) + ak.firsts(array)
<Array [[1.1, 3.3, 6.6], None, [4.4, 9.9]] type='3 * option[var * float64]'>
>>> ak.fill_none(scan - ak.firsts(scan) + ak.firsts(array), [])
<Array [[1.1, 3.3, 6.6], [], [4.4, 9.9]] type='3 * var * float64'>
Most of these don't have equivalents in Awkward 0.x.