0

In a python pandas DataFrame, I would like to update the value of the index in a single row (preferably in-place as the DataFrame is quite large).

The index is DatetimeIndex and the DataFrame may contain several columns.

For instance:

In [1]: import pandas as pd
In [2]: pd.DataFrame({'DATA': [1,2,3]},
                      index=[pd.Timestamp(2011,10,01,00,00,00),
                             pd.Timestamp(2011,10,01,02,00,00),
                             pd.Timestamp(2011,10,01,03,00,00)])
Out[5]: 
                     DATA
2011-10-01 00:00:00     1
2011-10-01 02:00:00     2
2011-10-01 03:00:00     3

The desired output is:

                     DATA
2011-10-01 01:00:00     1   <---- Index changed !!!
2011-10-01 02:00:00     2
2011-10-01 03:00:00     3

Is there a simple (and cheap) way to do this for large DataFrames ?

Assuming the location of the sample is known (for instance it is the nth row the needs to be changed) !

Pedia
  • 1,432
  • 2
  • 11
  • 17

2 Answers2

2

One possible solution with Series.replace, but first need convert Index.to_series:

df.index = df.index
             .to_series()
             .replace({pd.Timestamp('2011-10-01'): pd.Timestamp('2011-10-01 01:00:00')})
print (df)
                     DATA
2011-10-01 01:00:00     1
2011-10-01 02:00:00     2
2011-10-01 03:00:00     3

Another solution with Index.where (new in 0.19.0):

df.index = df.index.where(df.index != pd.Timestamp('2011-10-01'),
                          [pd.Timestamp('2011-10-01 01:00:00')])

print (df)
                     DATA
2011-10-01 01:00:00     1
2011-10-01 02:00:00     2
2011-10-01 03:00:00     3

Solution with appending new row and remove old one by drop, last sort_index:

df.loc[pd.Timestamp('2011-10-01 01:00:00')] = df.loc['2011-10-01 00:00:00', 'DATA']
df.drop(pd.Timestamp('2011-10-01 00:00:00'), inplace=True)
df.sort_index(inplace=True)
print (df)
                     DATA
2011-10-01 01:00:00     1
2011-10-01 02:00:00     2
2011-10-01 03:00:00     3

Another solution if need replace by value not by position:

df.index.set_value(df.index, pd.Timestamp(2011,10,1,0,0,0), pd.Timestamp(2011,10,1,1,0,0))
print (df)
                     DATA
2011-10-01 01:00:00     1
2011-10-01 02:00:00     2
2011-10-01 03:00:00     3

Last solution with converting index to numpy array from comment:

i = 0
df.index.values[i] = pd.Timestamp('2011-10-01 01:00:00')
print (df)          
                     DATA
2011-10-01 01:00:00     1
2011-10-01 02:00:00     2
2011-10-01 03:00:00     3
Community
  • 1
  • 1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • These are working indeed but they seem to perform some kind of search and they re-assign the whole index. I know the index of the row to be changed, so can I use this information to avoid searching through a really big index vector ? – Pedia Oct 31 '16 at 11:34
  • I add another solution, please check it. – jezrael Oct 31 '16 at 11:40
  • How about this one ? df.index.values[i] = pd.Timestamp('2011-10-01 01:00:00') ? It works for me but I don't know if it has any side effects. If you don't think so, please add it to your answer. – Pedia Nov 01 '16 at 09:05
  • I test it, please give me a time. – jezrael Nov 01 '16 at 09:09
  • It is indeed faster but it still includes a search somehow. A search is redundant if I know the exact position. (this is about the df.set_value() method). – Pedia Nov 01 '16 at 09:10
  • thank you for accepting! I add your solution to answer, it is really interesting. – jezrael Nov 01 '16 at 09:21
2

A Fast way would be a direct lookup if you already are aware of the index to be operated upon and then you can set it's value accordingly with the help of Index.set_value:

df.index.set_value(df.index, df.index[0], pd.Timestamp(2011,10,1,1,0,0))
#                  <-index-> <-row num->  <---value to be inserted--->

This is an inplace operation, so you don't need to assign back the result to itself.

Nickil Maveli
  • 29,155
  • 8
  • 82
  • 85
  • Is there a better documentation for this method "Index.set_value" than the one on the pandas docs page ? All what is says: use this if you know what you're doing !!! Is it performing a binary search so it needs a sorted array for example ? – Pedia Oct 31 '16 at 11:50
  • I've commented to provide further explanation. I Know the doc-page lacks an illustration regarding it's usage. But if you go through answers regarding `DF.set_value`, you would fairly get an idea as it works very similar to that. – Nickil Maveli Oct 31 '16 at 11:55