1

I have a pandas dataframe containing n time series in the same Datetime column, each one associated to a different Id, with a corresponding value associated. I would like to pivot the table and reindex to the nearest timestamp. Notice that there can be cases where a timestamp is missing, as in Id-3, in this case the value would need to become NaN.

Datetime       Id Value
5-26-17 8:00   1  2.3
5-26-17 8:30   1  4.5
5-26-17 9:00   1  7
5-26-17 9:30   1  8.1
5-26-17 10:00  1  7.9
5-26-17 10:30  1  3.4
5-26-17 11:00  1  2.1
5-26-17 11:30  1  1.8
5-26-17 12:00  1  0.4
5-26-17 8:02   2  2.6
5-26-17 8:32   2  4.8
5-26-17 9:02   2  7.3
5-26-17 9:32   2  8.4
5-26-17 10:02  2  8.2
5-26-17 10:32  2  3.7
5-26-17 11:02  2  2.4
5-26-17 11:32  2  2.1
5-26-17 12:02  2  0.7
5-26-17 8:30   3  4.5
5-26-17 9:00   3  7
5-26-17 9:30   3  8.1
5-26-17 10:00  3  7.9
5-26-17 10:30  3  3.4
5-26-17 11:00  3  2.1
5-26-17 11:30  3  1.8
5-26-17 12:00  3  0.4

Expected results:

Datetime      Id-1 Id-2 Id-3
5-26-17 8:00  2.3  2.6  NaN
5-26-17 8:30  4.5  4.8  4.5
5-26-17 9:00  7    7.3  7
5-26-17 9:30  8.1  8.4  8.1
5-26-17 10:00 7.9  8.2  7.9
5-26-17 10:30 3.4  3.7  3.4
5-26-17 11:00 2.1  2.4  2.1
5-26-17 11:30 1.8  2.1  1.8
5-26-17 12:00 0.4  0.7  0.4

How would you do this?

Joey
  • 89
  • 7

1 Answers1

1

I believe need convert column to datetimes and floor by 30 minutes by floor, last pivot and add_prefix:

df['Datetime'] = pd.to_datetime(df['Datetime']).dt.floor('30T')
df = df.pivot('Datetime','Id','Value').add_prefix('Id-')
print (df)
Id                   Id-1  Id-2  Id-3
Datetime                             
2017-05-26 08:00:00   2.3   2.6   NaN
2017-05-26 08:30:00   4.5   4.8   4.5
2017-05-26 09:00:00   7.0   7.3   7.0
2017-05-26 09:30:00   8.1   8.4   8.1
2017-05-26 10:00:00   7.9   8.2   7.9
2017-05-26 10:30:00   3.4   3.7   3.4
2017-05-26 11:00:00   2.1   2.4   2.1
2017-05-26 11:30:00   1.8   2.1   1.8
2017-05-26 12:00:00   0.4   0.7   0.4

Another solution is use resample with mean:

df['Datetime'] = pd.to_datetime(df['Datetime'])

df = (df.set_index('Datetime')
        .groupby('Id')
        .resample('30T')['Value']
        .mean().unstack(0)
        .add_prefix('Id-'))

print (df)
Id                   Id-1  Id-2  Id-3
Datetime                             
2017-05-26 08:00:00   2.3   2.6   NaN
2017-05-26 08:30:00   4.5   4.8   4.5
2017-05-26 09:00:00   7.0   7.3   7.0
2017-05-26 09:30:00   8.1   8.4   8.1
2017-05-26 10:00:00   7.9   8.2   7.9
2017-05-26 10:30:00   3.4   3.7   3.4
2017-05-26 11:00:00   2.1   2.4   2.1
2017-05-26 11:30:00   1.8   2.1   1.8
2017-05-26 12:00:00   0.4   0.7   0.4
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252