0

I wish to create a table like the below:

ValueX ValueY Sum
0.203627201 3.803627201 0.026721649
3.803627201 7.403627201 0.007943925
7.403627201 11.0036272 0.012526994
11.0036272 14.6036272 0.005983166

ValueX & Y essentially come from binning the column below and I wish to sum all the corresponding values when between these values (I have provided the excel formula below for extra information)

I'm trying to find a way that pandas or numpy can create the sum without needing a loop, but if I need to loop, happy to hear some thoughts for speed. In some datasets the [ValueX] array could be >30 in length (in the example I have provided there is only 4 but is generally 15-20) and likewise the time stamps in the main data could be +15000 on the bigger sets, thus basic looping will take time.

So my solution will need to be array based, or looped obviously, based on the length of the lookup table and the length of the main data.

=SUMIFS([Value_toSum],[ValuesWhereXYcomefrom],">=" & ValueX(A1),[ValuesWhereXYcomefrom],"<" & ValueY(B1))

time Value0 Value1 Value2 ValuesWhereXYcomefrom Value_toSum
1 41.43 6.579482077 0.00531021
2 41.650002 6.756817908 46.72466411 6.915187703 0.001200456
3 41.700001 6.221966706 11.64727001 1.871959552 0.000959257
4 41.740002 6.230847055 46.92753343 7.531485368 0.006228989
5 42 6.637399856 8.031374656 1.210018204 0.010238095
6 42.43 7.484894608 16.24547568 2.170434793 -0.007777563
7 42.099998 7.595291765 38.73871244 5.100358702 0.003562993
8 42.25 7.567457423 37.07538953 4.899319211 0.01088755
9 42.709999 8.234795546 64.27986403 7.805884636 0.005151042
10 42.93 8.369526407 24.72700129 2.954408659 -0.003028209
11 42.799999 8.146653099 61.52243361 7.55186613 0
12 42.799999 8.060386469 4.069816054 0.504915747 0.007943925
13 43.139999 8.35486004 24.02202482 2.875215708 0.005099722
14 43.360001 8.344303896 65.69597232 7.87315193 -0.003228782
15 43.220001 8.354012136 59.0376027 7.066975932 -0.005090259
16 43 8.619463036 52.3941269 6.078583629 0.010232535
17 43.439999 9.080868488 42.93689619 4.728280808 0.002302072
18 43.540001 8.867340897 57.81235164 6.51969427 0.003445062
19 43.689999 8.181615293 7.747858006 0.946983906 0.002288899
20 43.790001 7.969474934 11.37454783 1.427264396 0.005023909
21 44.009998 7.692703917 70.51485899 9.166459512 -0.001817723
22 43.93 7.862134791 1.600944501 0.203627201 0.001365855
23 43.990002 7.823998385 19.07940961 2.438575351 0.002273198
24 44.09 7.809313794 82.27449155 10.53543163 -0.007711499
25 43.75 8.554266599 38.11043116 4.455137179 -0.002971451
26 43.619999 8.606167168 90.18746868 10.47940005 -0.011462632
27 43.119999 9.179483965 24.77124597 2.6985445 0.005102064
28 43.34 8.759390665 71.79460753 8.196301578 0.000461491
29 43.360001 8.729639563 16.1530103 1.850363945 -0.002767505
  • I Should also add I have the ability to create the "Bins", my question is simply around how to best handle the SumIFs python function. – Mikal Fischer Apr 05 '21 at 01:44

1 Answers1

0

Assuming that the data is given as a pandas DataFrame df, and that bins is a list of bin boundaries, e.g.:

bins = [0.203627201, 3.803627201, 7.403627201, 11.0036272, 14.6036272]

then the following should work:

df.groupby(pd.cut(df["ValuesWhereXYcomefrom"], bins))["Value_toSum"].sum()
bb1
  • 7,174
  • 2
  • 8
  • 23
  • worked great! Even learnt a new library function (cut), thank you! If I wanted to sum when values greater than 0: df.groupby(pd.cut(df["ValuesWhereXYcomefrom"], bins))["Value_toSum"].sum(df["Value_toSum"]>0) .....? – Mikal Fischer Apr 05 '21 at 04:28
  • Use `df[df[”Value_toSum”] > 0]` instead of `df`. – bb1 Apr 05 '21 at 04:38