1

I have tabular data (described by col1, col2,...) associated to a Timestamp and a custom made Group. Example:

           Group  Col1  Col2  Col3
Time
2020-05-18     A    10    20    30
2020-05-18     B    10    20    30
2020-05-18     C    10    20    30
2020-05-19     A    10    20    30
2020-05-19     B    10    20    30
2020-05-19     C    10    20    30
2020-05-20     A    10    20    30
2020-05-20     B    10    20    30
2020-05-20     C    10    20    30
2020-05-21     A    10    20    30
2020-05-21     B    10    20    30
2020-05-21     C    10    20    30

This dataframe can be generated by:

dataframe = pd.DataFrame({"Time": ["2020-05-18", "2020-05-18", "2020-05-18", "2020-05-19", "2020-05-19", "2020-05-19", "2020-05-20", "2020-05-20", "2020-05-20", "2020-05-21", "2020-05-21", "2020-05-21"], "Group": ["A","B","C","A","B","C","A","B","C","A","B","C"], "Col1": 12*[10], "Col2": 12*[20], "Col3": 12*[30]})
dataframe["Time"] = pd.to_datetime(dataframe["Time"])
dataframe = dataframe.set_index("Time")

I would like to create time bins (like with resample function), average values WITHIN the same group, and THEN sum them up within the same time bin. But if I write dataframe.resample("2D").sum(), values will be direclty summed up over all the groups without first being averaged in the same group. Like this:

            Col1  Col2  Col3
Time
2020-05-18    60   120   180
2020-05-20    60   120   180

But what I need is to average values belonging to the same group BEFORE summing them up. In this case the desired output would be:

            Col1  Col2  Col3
Time
2020-05-18    30   60   90
2020-05-20    30   60   90

I tried to apply groupby after resampling (e.g. dataframe.resample("2D").grouby("Group")), but it raises a TypeError ("'TimeGrouper' object is not callable")

What could be a way to solve this problem? Thanks in advance.


EDIT:

Output of dataframe.groupby(["Time","Group"]).mean() :

                  Col1  Col2  Col3
Time       Group
2020-05-18 A        10    20    30
           B        10    20    30
           C        10    20    30
2020-05-19 A        10    20    30
           B        10    20    30
           C        10    20    30
2020-05-20 A        10    20    30
           B        10    20    30
           C        10    20    30
2020-05-21 A        10    20    30
           B        10    20    30
           C        10    20    30
Marvin
  • 37
  • 5
  • [Please don't post images of code/data (or links to them)](http://meta.stackoverflow.com/questions/285551/why-may-i-not-upload-images-of-code-on-so-when-asking-a-question) – jezrael May 18 '20 at 06:34
  • Can you add output after `gruopby` and before `resample` from sample data? – jezrael May 18 '20 at 07:49

1 Answers1

2

Use Grouper with aggregate mean:

print (dataframe.groupby(['Group',pd.Grouper(freq='2D', key='Time')]).mean())
                  Col1  Col2  Col3
Group Time                        
A     2020-05-18    10    20    30
      2020-05-20    10    20    30
B     2020-05-18    10    20    30
      2020-05-20    10    20    30
C     2020-05-18    10    20    30
      2020-05-20    10    20    30

And then sum by second level Time:

df = (
    dataframe
    .groupby(['Group', pd.Grouper(freq='2D', key='Time')])
    .mean()
    .sum(level=1)
)

print (df)
            Col1  Col2  Col3
Time                        
2020-05-18    30    60    90
2020-05-20    30    60    90
Little Bobby Tables
  • 4,466
  • 4
  • 29
  • 46
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • By applying this code, I get the first result (rows [60, 120, 180]), which is the wrong one. The desired output is the last one. The problem with this code is that the first instruction `df.groupby(["Time","Group"]).mean()` averages according to Group in that precise time, but I need to average within a given time bin. So for example, if Group "A" occurs at 18/05/2020 and also at 19/05/2020, it needs to be averaged together (since my time bin is 2 days). But this first instruction will average the Group "A" in two days separately. – Marvin May 18 '20 at 07:22
  • @Marvin - Hard test if data re in pictures - [Please don't post images of code/data (or links to them)](http://meta.stackoverflow.com/questions/285551/why-may-i-not-upload-images-of-code-on-so-when-asking-a-question) – jezrael May 18 '20 at 07:27
  • I added now the code to my question to generate the sample dataframe. – Marvin May 18 '20 at 07:42
  • 2
    Thanks, it works! After my last question update, since I set "Time" as index, the parameter "key" in your solution should just be changed into "level", so it works as you described. – Marvin May 18 '20 at 08:31