Shifting values in datetimeindex of pandas dataframe

Question

I have a df with a DateTimeIndex of 30 minute intervals over a long period (> 1 year), so >17520 rows. For reasons related to daylight savings, two of the index values are repeated in the index and two values are missing. So the duplicated values are:

In[1]: df[df.index.duplicated('first')] 
Out[2]: 
                            a           b        c 
timestamp                                                                   
2012-10-07 01:00:00           NaN        NaN      NaN     
2012-10-07 01:30:00           NaN        NaN      NaN     
2013-10-06 01:00:00           NaN        NaN      NaN      
2013-10-06 01:30:00           NaN        NaN      NaN

I want to change these to the missing values, 1 hour later:

In[3]: df[df.index.duplicated('first')].shift(1,freq="H")
Out[4]: 
                           a            b        c
timestamp                                                                   
2012-10-07 02:00:00           NaN        NaN      NaN     
2012-10-07 02:30:00           NaN        NaN      NaN     
2013-10-06 02:00:00           NaN        NaN      NaN        
2013-10-06 02:30:00           NaN        NaN      NaN

But this doesn't change the index:

df[df.index.duplicated('first')] = df[df.index.duplicated('first')].shift(1,freq="H")

What would?

jezrael · Accepted Answer · 2017-04-09T07:03:12.370

I think you need map duplicated index with rename by dict:

print (df)
                     a   b   c
timestamp                     
2013-10-06 01:00:00  1 NaN NaN
2013-10-06 01:30:00  2 NaN NaN
2013-10-06 01:00:00  3 NaN NaN
2013-10-06 01:30:00  4 NaN NaN
2012-10-08 01:30:00  5 NaN NaN
2013-10-10 01:00:00  6 NaN NaN


df1 = df[df.index.duplicated('first')]
d = dict(zip(df1.index, df1.shift(1,freq="H").index))
print (d)
{Timestamp('2013-10-06 01:00:00'): Timestamp('2013-10-06 02:00:00'), 
 Timestamp('2013-10-06 01:30:00'): Timestamp('2013-10-06 02:30:00')}

df = df.rename(index=d)
print (df)
                     a   b   c
timestamp                     
2013-10-06 02:00:00  1 NaN NaN
2013-10-06 02:30:00  2 NaN NaN
2013-10-06 02:00:00  3 NaN NaN
2013-10-06 02:30:00  4 NaN NaN
2012-10-08 01:30:00  5 NaN NaN
2013-10-10 01:00:00  6 NaN NaN

Similar solution:

idx = df.index[df.index.duplicated('first')]
d = dict(zip(idx, idx.to_series().shift(freq="H").index))
print (d)
{Timestamp('2013-10-06 01:00:00'): Timestamp('2013-10-06 02:00:00'), 
 Timestamp('2013-10-06 01:30:00'): Timestamp('2013-10-06 02:30:00')}

df = df.rename(index=d)
print (df)
                     a   b   c
timestamp                     
2013-10-06 02:00:00  1 NaN NaN
2013-10-06 02:30:00  2 NaN NaN
2013-10-06 02:00:00  3 NaN NaN
2013-10-06 02:30:00  4 NaN NaN
2012-10-08 01:30:00  5 NaN NaN
2013-10-10 01:00:00  6 NaN NaN
2013-10-06 02:30:00   8 NaN NaN
2012-10-08 01:30:00   9 NaN NaN
2013-10-10 01:00:00  10 NaN NaN

idx = df.index[df.index.duplicated('first')]
s = idx.to_series().shift(freq="H")
#swap index with values in Series
d = pd.Series(s.index.values, index = s.values).to_dict()
print (d)
{Timestamp('2013-10-06 01:00:00'): Timestamp('2013-10-06 02:00:00'), 
 Timestamp('2013-10-06 01:30:00'): Timestamp('2013-10-06 02:30:00')}

df = df.rename(index=d)
print (df)
                     a   b   c
timestamp                     
2013-10-06 02:00:00  1 NaN NaN
2013-10-06 02:30:00  2 NaN NaN
2013-10-06 02:00:00  3 NaN NaN
2013-10-06 02:30:00  4 NaN NaN
2012-10-08 01:30:00  5 NaN NaN
2013-10-10 01:00:00  6 NaN NaN

EDIT1:

You need add timedeltas created by cumcount with to_timedelta to original index.

delta = pd.to_timedelta(df.groupby(level=0).cumcount(), unit='H')
print (delta)
timestamp
2013-10-06 01:00:00   00:00:00
2013-10-06 01:30:00   00:00:00
2013-10-06 01:00:00   01:00:00
2013-10-06 01:30:00   01:00:00
2012-10-08 01:30:00   00:00:00
2013-10-10 01:00:00   00:00:00
dtype: timedelta64[ns]

df.index = df.index + delta
print (df)
                     a   b   c
2013-10-06 01:00:00  1 NaN NaN
2013-10-06 01:30:00  2 NaN NaN
2013-10-06 02:00:00  3 NaN NaN
2013-10-06 02:30:00  4 NaN NaN
2012-10-08 01:30:00  5 NaN NaN
2013-10-10 01:00:00  6 NaN NaN

No. The first suggestion gives df1 containing ONLY the changed timestamps (not the rest of the year). The second suggestion shifts every timestamp in the df, not just the duplicated ones. — doctorer, Apr 07 '17 at 06:17
Thanks. That's almost there, but not quite. This has renamed both instances of the duplicated values, so I now have duplicates of `'2012-10-07 02:00:00'` etc. I want to rename only the _second_ instance of each timestamp — doctorer, Apr 07 '17 at 06:35
So there are two occurrences of `'2012-10-07 01:00:00'` in the df, the first one is correct, the second should be changed to `'2012-10-07 02:00:00'` If I change them both, I have 2 occurences of the 2am timestamp, and none of the 1am. — doctorer, Apr 09 '17 at 06:50

Shifting values in datetimeindex of pandas dataframe

1 Answers1