1

I have a Dataframe of raw data:

df
Out: 
              Date_time   10a   10b   10c   40a   40b   40c  100a  100b  100c
120 2019-02-04 16:00:00  26.7  26.9   NaN  26.7   NaN   NaN  24.9   NaN   NaN
121 2019-02-04 17:00:00  23.4  24.0  23.5  24.3  24.1  24.0  25.1  24.8  25.1
122 2019-02-04 18:00:00  23.1  24.0  23.3  24.3  24.1  24.0  25.1  24.8  25.1
123 2019-02-04 19:00:00  22.8  23.8  22.9  24.3  24.1  24.0  25.1  24.8  25.1
124 2019-02-04 20:00:00   NaN  23.5  22.6  24.3  24.1  24.0  25.1  24.8  25.1

I wish to create a DataFrame containing the 'Date_time' column and several columns of data means. In this instance there will be 3 means for each row, one each for 10, 40, and 100, calculating the mean values for a, b, and c for each of these numbered intervals.

means
Out: 
              Date_time       10cm       40cm  100cm
120 2019-02-04 16:00:00  26.800000  26.700000   24.9
121 2019-02-04 17:00:00  23.633333  24.133333   25.0
122 2019-02-04 18:00:00  23.466667  24.133333   25.0
123 2019-02-04 19:00:00  23.166667  24.133333   25.0
124 2019-02-04 20:00:00  23.050000  24.133333   25.0

I have tried the following (taken from this answer):

means = df['Date_time'].copy()

means['10cm'] = df.loc[:, '10a':'10c'].mean(axis=1)

But this results in all the mean values being clumped together in one cell at the bottom of the 'Date_time' column with '10cm' being given as the cell's index.

means
Out: 
120                                   2019-02-04 16:00:00
121                                   2019-02-04 17:00:00
122                                   2019-02-04 18:00:00
123                                   2019-02-04 19:00:00
124                                   2019-02-04 20:00:00
10cm    120    26.800000
121    23.633333
122    23.46...
Name: Date_time, dtype: object

I believe that this is something to do with means being a Series object rather that a DataFrame object when I copy across the 'Date_time' column, but I'm not sure. Any pointers would be greatly appreciated!

enter image description here

georussell
  • 302
  • 1
  • 10

1 Answers1

2

It was the Series issue. Turns out writing out the question helped me realise the issue! My solution was altering the initial creation of means using to_frame():

means = df['Date_time'].copy().to_frame()

I'll leave the question up in case anyone else is having a similar issue, to save them having to spend time writing it all up!

georussell
  • 302
  • 1
  • 10