1

I wonder if anyone could please help me with this issue: I have a pandas data frame (generated from a text file) which should have a structure similar to this one:

import pandas as pd

data = {'Objtype'   : ['bias', 'bias', 'flat', 'flat', 'StdStar', 'flat', 'Arc', 'Target1', 'Arc', 'Flat', 'Flat', 'Flat', 'bias', 'bias'], 
        'UT'        :  pd.date_range("23:00", "00:05", freq="5min").values,
        'Position'  : ['P0', 'P0', 'P0', 'P0', 'P1', 'P1','P1', 'P2','P2','P2', 'P0', 'P0', 'P0', 'P0']}

df   = pd.DataFrame(data=data)

I would like to do some operations taking in consideration the time of the observation so I change the UT column from a string format to a numpy datetime64:

df['UT'] = pd.to_datetime(df['UT'])

Which gives me something like this:

   Objtype Position                  UT
0     bias       P0 2016-08-31 23:45:00
1     bias       P0 2016-08-31 23:50:00
2     flat       P0 2016-08-31 23:55:00
3     flat       P0 2016-08-31 00:00:00
4  StdStar       P1 2016-08-31 00:05:00
5     flat       P1 2016-08-31 00:10:00
6      Arc       P1 2016-08-31 00:15:00
7  Target1       P1 2016-08-31 00:20:00

However, in here there are two issues:

First) the year/month/day is assigned to the current one.

Second) the day has not changed from 23:59 -> 00:00. Rather it has gone backwards.

If we know the true date at the first data frame index row and we know that all the entries are sequentially (and they always go from sunset to sunrise). How could we correct for these issues?

Delosari
  • 677
  • 2
  • 17
  • 29
  • Looks like you are flat-fielding your observations. Cool. Try this vector, which you can add to your data frame: a = pd.date_range('08/31/2016 23:45:00', periods=72, freq='5min') – Rohit Aug 31 '16 at 23:11
  • Thank you for the comment. As a matter of fact I do ;) https://www.youtube.com/watch?v=ZsQPx5ow1JY – Delosari Aug 31 '16 at 23:25

1 Answers1

1

To find the time delta between 2 rows:

df.UT - df.UT.shift()
Out[48]: 
0                 NaT
1            00:05:00
2            00:05:00
3   -1 days +00:05:00
4            00:05:00
5            00:05:00
6            00:05:00
7            00:05:00
Name: UT, dtype: timedelta64[ns]

To find when time goes backwards:

df.UT - df.UT.shift() < pd.Timedelta(0)
Out[49]: 
0    False
1    False
2    False
3     True
4    False
5    False
6    False
7    False
Name: UT, dtype: bool

To have an additional 1 day for each row going backward:

((df.UT - df.UT.shift() < pd.Timedelta(0))*pd.Timedelta(1, 'D'))
Out[50]: 
0   0 days
1   0 days
2   0 days
3   1 days
4   0 days
5   0 days
6   0 days
7   0 days
Name: UT, dtype: timedelta64[ns]

To broadcast forward the additional days down the series, use the cumsum pattern:

((df.UT - df.UT.shift() < pd.Timedelta(0))*pd.Timedelta(1, 'D')).cumsum()
Out[53]: 
0   0 days
1   0 days
2   0 days
3   1 days
4   1 days
5   1 days
6   1 days
7   1 days
Name: UT, dtype: timedelta64[ns]

Add this correction vector back to your original UT column:

df.UT + ((df.UT - df.UT.shift() < pd.Timedelta(0))*pd.Timedelta(1, 'D')).cumsum()
Out[51]: 
0   2016-08-31 23:45:00
1   2016-08-31 23:50:00
2   2016-08-31 23:55:00
3   2016-09-01 00:00:00
4   2016-09-01 00:05:00
5   2016-09-01 00:10:00
6   2016-09-01 00:15:00
7   2016-09-01 00:20:00
Name: UT, dtype: datetime64[ns]
Zeugma
  • 31,231
  • 9
  • 69
  • 81