Python Sum lookup dynamic array table with df column

Question

I wish to create a table like the below:

ValueX	ValueY	Sum
0.203627201	3.803627201	0.026721649
3.803627201	7.403627201	0.007943925
7.403627201	11.0036272	0.012526994
11.0036272	14.6036272	0.005983166

ValueX & Y essentially come from binning the column below and I wish to sum all the corresponding values when between these values (I have provided the excel formula below for extra information)

I'm trying to find a way that pandas or numpy can create the sum without needing a loop, but if I need to loop, happy to hear some thoughts for speed. In some datasets the [ValueX] array could be >30 in length (in the example I have provided there is only 4 but is generally 15-20) and likewise the time stamps in the main data could be +15000 on the bigger sets, thus basic looping will take time.

So my solution will need to be array based, or looped obviously, based on the length of the lookup table and the length of the main data.

=SUMIFS([Value_toSum],[ValuesWhereXYcomefrom],">=" & ValueX(A1),[ValuesWhereXYcomefrom],"<" & ValueY(B1))

time	Value0	Value1	Value2	ValuesWhereXYcomefrom	Value_toSum
1	41.43	6.579482077			0.00531021
2	41.650002	6.756817908	46.72466411	6.915187703	0.001200456
3	41.700001	6.221966706	11.64727001	1.871959552	0.000959257
4	41.740002	6.230847055	46.92753343	7.531485368	0.006228989
5	42	6.637399856	8.031374656	1.210018204	0.010238095
6	42.43	7.484894608	16.24547568	2.170434793	-0.007777563
7	42.099998	7.595291765	38.73871244	5.100358702	0.003562993
8	42.25	7.567457423	37.07538953	4.899319211	0.01088755
9	42.709999	8.234795546	64.27986403	7.805884636	0.005151042
10	42.93	8.369526407	24.72700129	2.954408659	-0.003028209
11	42.799999	8.146653099	61.52243361	7.55186613	0
12	42.799999	8.060386469	4.069816054	0.504915747	0.007943925
13	43.139999	8.35486004	24.02202482	2.875215708	0.005099722
14	43.360001	8.344303896	65.69597232	7.87315193	-0.003228782
15	43.220001	8.354012136	59.0376027	7.066975932	-0.005090259
16	43	8.619463036	52.3941269	6.078583629	0.010232535
17	43.439999	9.080868488	42.93689619	4.728280808	0.002302072
18	43.540001	8.867340897	57.81235164	6.51969427	0.003445062
19	43.689999	8.181615293	7.747858006	0.946983906	0.002288899
20	43.790001	7.969474934	11.37454783	1.427264396	0.005023909
21	44.009998	7.692703917	70.51485899	9.166459512	-0.001817723
22	43.93	7.862134791	1.600944501	0.203627201	0.001365855
23	43.990002	7.823998385	19.07940961	2.438575351	0.002273198
24	44.09	7.809313794	82.27449155	10.53543163	-0.007711499
25	43.75	8.554266599	38.11043116	4.455137179	-0.002971451
26	43.619999	8.606167168	90.18746868	10.47940005	-0.011462632
27	43.119999	9.179483965	24.77124597	2.6985445	0.005102064
28	43.34	8.759390665	71.79460753	8.196301578	0.000461491
29	43.360001	8.729639563	16.1530103	1.850363945	-0.002767505

I Should also add I have the ability to create the "Bins", my question is simply around how to best handle the SumIFs python function. — Mikal Fischer, Apr 05 '21 at 01:44

score 0 · Answer 1 · answered Apr 05 '21 at 04:10

0

Assuming that the data is given as a pandas DataFrame df, and that bins is a list of bin boundaries, e.g.:

bins = [0.203627201, 3.803627201, 7.403627201, 11.0036272, 14.6036272]

then the following should work:

df.groupby(pd.cut(df["ValuesWhereXYcomefrom"], bins))["Value_toSum"].sum()

answered Apr 05 '21 at 04:10

bb1

7,174
2
8
23

worked great! Even learnt a new library function (cut), thank you! If I wanted to sum when values greater than 0: df.groupby(pd.cut(df["ValuesWhereXYcomefrom"], bins))["Value_toSum"].sum(df["Value_toSum"]>0) .....? – Mikal Fischer Apr 05 '21 at 04:28
Use `df[df[”Value_toSum”] > 0]` instead of `df`. – bb1 Apr 05 '21 at 04:38

Python Sum lookup dynamic array table with df column

1 Answers1

Linked