Conditional numpy.cumsum?

Question

I'm very new to python and numpy, so sorry if I misuse some terminology.

I have converted a raster to a 2D numpy array in the hopes of doing calculations on it quickly and efficiently.

I need to get the cumulative sum across a numpy array such that, for each value, I generate the sum of all values that are less than or equal to that, and write that value to a new array. I need to cycle through the entire array that way.
I also need to scale the output between 1 and 100, but that seems
more straightforward.

Attempt to clarify by example:

array([[ 4,  1  ,  3   ,  2] dtype=float32)

I would want the output values (just doing first row by hand) to read:

array([[ 10,  1  ,  6   ,  3], etc.

Any ideas on how to do that?

Thanks in advance!

The near finished script for anyone who's interested:

#Generate Cumulative Thresholds
#5/15/14

import os
import sys
import arcpy
import numpy as np

#Enable overwriting output data
arcpy.env.overwriteOutput=True

#Set working directory
os.chdir("E:/NSF Project/Salamander_Data/Continuous_Rasters/Canadian_GCM/2020/A2A/")

#Set geoprocessing variables
inRaster = "zero_eurycea_cirrigera_CA2A2020.tif"
des = arcpy.Describe(inRaster)
sr = des.SpatialReference
ext = des.Extent
ll = arcpy.Point(ext.XMin,ext.YMin)

#Convert GeoTIFF to numpy array
a = arcpy.RasterToNumPyArray(inRaster)

#Flatten for calculations
a.flatten()

#Find unique values, and record their indices to a separate object
a_unq, a_inv = np.unique(a, return_inverse=True)

#Count occurences of array indices
a_cnt = np.bincount(a_inv)

#Cumulatively sum the unique values multiplied by the number of
#occurences, arrange sums as initial array
b = np.cumsum(a_unq * a_cnt)[a_inv]

#Divide all values by 10 (reverses earlier multiplication done to
#facilitate accurate translation of ASCII scientific notation
#values < 1 to array)
b /= 10

#Rescale values between 1 and 100
maxval = np.amax(b)
b /= maxval
b *= 100

#Restore flattened array to shape of initial array
c = b.reshape(a.shape)

#Convert the array back to raster format
outRaster = arcpy.NumPyArrayToRaster(c,ll,des.meanCellWidth,des.meanCellHeight)

#Set output projection to match input
arcpy.DefineProjection_management(outRaster, sr)

#Save the raster as a TIFF
outRaster.save("C:/Users/mkcarte2/Desktop/TestData/outRaster.tif")

sys.exit()

Could you give an example on e.g. a small 1D or 2D array of what you would expect as the output? As your question is currently, it is not clear what you're asking. — Oliver W., May 14 '14 at 18:53
Welcome to StackOverflow. Your question would be clearer if you provide a simple input and desired output. — shx2, May 14 '14 at 18:54
It's still not particularly clear, since your example data is sorted. What should happen for `array([[4, 1], [3, 2]])`? — user2357112, May 14 '14 at 19:06
There is an extra square bracket in your example. Is your input actually 1D or 2D? — Warren Weckesser, May 14 '14 at 19:25
@Vergentorix How does `[4, 1, 3, 2]` become `[10, 1, 6, 3]`? I don't see the pattern there. — M4rtini, May 14 '14 at 19:25
@WarrenWeckesser It's a raster so i assume the real data would be 2D — M4rtini, May 14 '14 at 19:26
Correct, it's 2D. The pattern is: for each value, sum all values over entire array that are less than or equal it. Write those values to a new array. — Vergentorix, May 14 '14 at 19:28
something like `[sum(x for x in arr if x <= a) for a in arr]` ? — njzk2, May 14 '14 at 20:05

Jaime · Accepted Answer · 2014-05-14T21:04:49.927

11

Depending on how you want to handle repeats, this could work:

In [40]: a
Out[40]: array([4, 4, 2, 1, 0, 3, 3, 1, 0, 2])

In [41]: a_unq, a_inv = np.unique(a, return_inverse=True)

In [42]: a_cnt = np.bincount(a_inv)

In [44]: np.cumsum(a_unq * a_cnt)[a_inv]
Out[44]: array([20, 20,  6,  2,  0, 12, 12,  2,  0,  6], dtype=int64)

Where of course a is your array flattened, that you would then have to reshape to the original shape.

And of course once numpy 1.9 is out, you can condense lines 41 and 42 above into the single, faster:

a_unq, a_inv, a_cnt = np.unique(a, return_inverse=True, return_counts=True)

edited May 14 '14 at 21:04

answered May 14 '14 at 20:02

Jaime

65,696
17
124
159

1

Heh, I *just* worked out exactly the same answer. Figures @Jaime, the master of `np.unique`, would get there first. – Warren Weckesser May 14 '14 at 20:07
I have a feeling that this a little cleaner and more efficient than my master-piece :). – Akavall May 14 '14 at 20:14
Very nice code. I'm still working out whether I can successfully run it. Thanks for your help! – Vergentorix May 15 '14 at 21:03
Can you post your exact code, and the full traceback of the error somewhere? I have just rerun the code above and haven't seen any issue, can you double check there is no typo or missing parenthesis in your code? – Jaime May 16 '14 at 15:39

Akavall · Answer 2 · 2014-05-14T20:06:07.513

1

Edit:

This is ugly, but I think it finally works:

import numpy as np

def cond_cum_sum(my_array):
    my_list = []
    prev = -np.inf
    prev_sum = 0
    for ele in my_array:
        if prev != ele:
            prev_sum += ele
        my_list.append(prev_sum)
        prev = ele
    return np.array(my_list)

a = np.array([[4,2,2,3],
              [9,0,5,2]], dtype=np.float32)

flat_a = a.flatten()
flat_a.sort() 

temp = np.argsort(a.ravel())   

cum_sums = cond_cum_sum(flat_a)

result_1 = np.zeros(len(flat_a))
result_1[temp] = cum_sums

result = result_1.reshape(a.shape)

Result:

>>> result
array([[  9.,   2.,   2.,   5.],
       [ 23.,   0.,  14.,   2.]])

edited May 14 '14 at 20:06

answered May 14 '14 at 19:10

Akavall

82,592
51
207
251

Thanks for the input. I believe my example was misleading, since I arbitrarily ranked the data. The actual data would not be ranked. – Vergentorix May 14 '14 at 19:15
@Akavall Thought along the same lines, but i think this fails if any values are equal. – M4rtini May 14 '14 at 19:42
@M4rtini, Yes it does not work if any values are equal. If I can't come up with something fast I'll delete my answer. – Akavall May 14 '14 at 19:47
shouldn't it be `[11, 4, 4, 7]`? or did I missed the part where only unique values must be summed? – njzk2 May 14 '14 at 20:09
@njzk2, Where did OP say that only unique values must be summed? My understanding was that each value would be only added once. – Akavall May 14 '14 at 20:13
@Akavall: my formulation is wrong, but my meaning is the same as yours. I don't see where each value must be added only once. – njzk2 May 14 '14 at 20:15
@njzk2, we only add if the next value is larger, so `[0,2,2,3]` would be `[0,2,2,5]`, we don't add the second `2` because it is not larger than the previous value. – Akavall May 14 '14 at 20:17
@Akavall: I read `all values that are less than or equal to`. One way or the other, my feeling is that this question lacks precision and leaves to much room for interpretation – njzk2 May 14 '14 at 21:37
@njzk2, I totally agree about lack of precision, which is a shame because this is an interesting question no matter how duplicates are handled. – Akavall May 14 '14 at 21:41
Sorry for the lack of precision. Essentially, what I'm trying to do is sum all values across the entire raster (which has been converted to a 2D array) that are less than or equal to the value/pixel in question. Whether or not they are unique values. I hope that clarifies it. I then need to scale all the values between 1 and 100, but that's a different question. – Vergentorix May 15 '14 at 21:05
@Vergentorix, I still don't think that I am clear. Can you provide a small complete example of your input and desired output? Providing a small example is generally a good idea; it helps people understand your question better. – Akavall May 15 '14 at 21:13

score 1 · Answer 3 · answered May 14 '14 at 20:16

With less numpy and more python:

a = np.array([[4,2,2,3],
              [9,0,5,2]], dtype=np.float32)

np.array([[sum(x for x in arr if x <= subarr) for subarr in arr] for arr in a])
# array([[ 11.,   4.,   4.,   7.],
#        [ 16.,   0.,   7.,   2.]])

If the sum only considers items once no matter how much they appear, then,

np.array([[sum(set(x for x in arr if x <= subarr)) for subarr in arr] for arr in a]) 
# array([[  9.,   2.,   2.,   5.],
#        [ 16.,   0.,   7.,   2.]])

My understanding (maybe incorrect) was that OP was interested in `cumsum` over the entire array, not the rows separately. — Akavall, May 14 '14 at 20:23

Lee · Answer 4 · 2014-05-14T21:50:22.570

1

How about this:

a=np.array([ 4,  1  ,  3   ,  2])

np.array([np.sum(a[a<=x])for x in a])

Gives

array([10,  1,  6,  3])

For a 2D array (assuming you want the sum of the whole array, not just the row):

a=np.array([[ 4,  1  ,  3   ,  2],[ 5,  1  ,  3   ,  2]])

np.array([[np.sum(a[a<=x])for x in a[y,:]]for y in range(a.shape[0])])

Gvies

array([[16,  2, 12,  6],
       [21,  2, 12,  6]])

edited May 14 '14 at 21:50

answered May 14 '14 at 21:22

Lee

29,398
28
117
170

Conditional numpy.cumsum?

4 Answers4

Linked