13

I have a dataframe in pandas of the following form:

      timestamps         light
7   2004-02-28 00:58:45 150.88
26  2004-02-28 00:59:45 143.52
34  2004-02-28 01:00:45 150.88
42  2004-02-28 01:01:15 150.88
59  2004-02-28 01:02:15 150.88

Here note that the index is not the timestamps column. But I want to resample (or bin the data somehow) to reflect the average value of the light column per minute , hour, day etc.. I have looked into the resample method that pandas offers and it requires the dataframe to have a datatime index for the method to work (unless I've misunderstood this).

  1. So my first question is, can I re-index the dataframe to have timestamps as the index (note that not each row has a unique timestamp and for each timestamp, there are about 30 rows with the same timestamp,each representing a sensor).

  2. If not, is there some other way to possibly achieve another dataframe which has the average value of light per hour , per day , per month etc..?

Any help would be appreciated.

Nikhil
  • 545
  • 1
  • 7
  • 18

3 Answers3

15

For pandas version 0.19.0 and newer you can use the on keyword:

df.resample('H', on='timestamps').mean()

Result:

                      light
timestamps                 
2004-02-28 00:00:00  147.20
2004-02-28 01:00:00  150.88
Stef
  • 28,728
  • 2
  • 24
  • 52
11

You are right - need DatetimeIndex, TimedeltaIndex or PeriodIndex else error:

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'

So you have to first reset_index and set_index if original index is important:

print (df.reset_index().set_index('timestamps'))
                     index   light
timestamps                        
2004-02-28 00:58:45      7  150.88
2004-02-28 00:59:45     26  143.52
2004-02-28 01:00:45     34  150.88
2004-02-28 01:01:15     42  150.88
2004-02-28 01:02:15     59  150.88

if not only set_index:

print (df.set_index('timestamps'))
                      light
timestamps                 
2004-02-28 00:58:45  150.88
2004-02-28 00:59:45  143.52
2004-02-28 01:00:45  150.88
2004-02-28 01:01:15  150.88
2004-02-28 01:02:15  150.88

and then resample:

print (df.reset_index().set_index('timestamps').resample('1D').mean())
            index    light
timestamps                
2004-02-28   33.6  149.408
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
2

Here is an approach to resample.

You can use the following method to sample at T interval.

If original data was in every minute, your new resampled data will be at the 2 min interval. You can use 3T, 4T.... any T value that fits your need.

df_2T = df.resample('2T', on = 'timestamp').mean()

For hourly df_hourly = df.resample('60T', on = 'timestamp').mean()

For daily df_daily = df.resample('1440T', on = 'timestamp').mean()

Note: One day has 60*24 = 1440 min

Arjjun
  • 1,203
  • 16
  • 15