2

I have a dataframe df1 indexed by datetime with entries every minutes for weeks sample:

           SAMPLE_TIME       Bottom     Top      Out     state                                                                    
0  2015-07-15 16:41:56      48.625   55.812   43.875        1       
1  2015-07-15 16:42:55      48.750   55.812   43.875        1     
2  2015-07-15 16:43:55      48.937   55.812   43.875        1       
3  2015-07-15 16:44:56      49.125   55.812   43.812        1      
4  2015-07-15 16:45:55      49.312   55.812   43.812        1     

I want to find the day with the lowest Avg(TempBottom,TempTop), then get the entire day data by minute so i can plot that day, i tried:

df2 = df1.groupby(pd.TimeGrouper('D')).agg(min) \
.sort(['TempTop','TempBottom'], ascending=[True,True])

Which gives me the lowest temperature days ordered. sample:

SAMPLE_TIME       Bottom     Top      Out     state                                                                    
2015-10-17       19.994   25.840   21.875        0       
2015-08-29       26.182   28.777   25.937        0       
2015-11-19       19.244   33.027   28.937        0        
2015-11-07       19.744   33.527   28.125        0           

then i though that all i need is to take the index of the first entry from df2:

 df1[df2.index[1]]

But i am getting an error:

KeyError: Timestamp('2015-08-29 00:00:00')
Anton Protopopov
  • 30,354
  • 12
  • 88
  • 93
InsaneBot
  • 2,422
  • 2
  • 19
  • 31

2 Answers2

3

From docs:

Warning

The following selection will raise a KeyError; otherwise this selection methodology would be inconsistent with other selection methods in pandas (as this is not a slice, nor does it resolve to one)

dft['2013-1-15 12:30:00']

To select a single row, use .loc

In [71]: dft.loc['2013-1-15 12:30:00']
Out[71]: 
A    0.193284
Name: 2013-01-15 12:30:00, dtype: float64

So you need to use loc method in your case:

In [103]: df1.loc[df2.index[0]]
Out[103]: 
           SAMPLE_TIME  TempBottom  TempTop  TempOut  State  Bypass
2015-07-15    16:41:56      48.625   55.812   43.875      1       1
2015-07-15    16:42:55      48.750   55.812   43.875      1       1
2015-07-15    16:43:55      48.937   55.812   43.875      1       1
2015-07-15    16:44:56      49.125   55.812   43.812      1       1
2015-07-15    16:45:55      49.312   55.812   43.812      1       1

EDIT

When you pass the single argument it's trying to access with the label. However when you pass the interval it's using as slice. You could do the trick to pass the value + 1 day:

In [276]: df2.index[0]
Out[276]: Timestamp('2015-07-15 00:00:00', offset='D')

In [277]: df2.index[0] + 1
Out[277]: Timestamp('2015-07-16 00:00:00', offset='D')

In [278]: df1.loc[df2.index[0]: df2.index[0] + 1]
Out[278]: 
                     TempBottom  TempTop  TempOut  State  Bypass
SAMPLE_TIME                                                     
2015-07-15 16:41:56      48.625   55.812   43.875      1       1
2015-07-15 16:42:55      48.750   55.812   43.875      1       1
2015-07-15 16:43:55      48.937   55.812   43.875      1       1
2015-07-15 16:44:56      49.125   55.812   43.812      1       1
2015-07-15 16:45:55      49.312   55.812   43.812      1       1

EDIT2

Or you could convert date of Timestamp to str:

In [355]: df2.index[0]
Out[355]: Timestamp('2015-07-15 00:00:00', offset='D')

In [356]: df2.index[0].date()
Out[356]: datetime.date(2015, 7, 15)

In [357]: str(df2.index[0].date())
Out[357]: '2015-07-15'

In [359]: df1[str(df2.index[0].date())]
Out[359]: 
                     TempBottom  TempTop  TempOut  State  Bypass
2015-07-15 16:41:56      48.625   55.812   43.875      1       1
2015-07-15 16:42:55      48.750   55.812   43.875      1       1
2015-07-15 16:43:55      48.937   55.812   43.875      1       1
2015-07-15 16:44:56      49.125   55.812   43.812      1       1
2015-07-15 16:45:55      49.312   55.812   43.812      1       1
Community
  • 1
  • 1
Anton Protopopov
  • 30,354
  • 12
  • 88
  • 93
  • I tried your solution but i get this now instead `KeyError: 'the label [2015-11-04 00:00:00] is not in the [index]' ` – InsaneBot Dec 10 '15 at 18:06
  • However this is working `df1.loc['2015-11-04']` the problem seems in the extra precision (time) that is included in the index – InsaneBot Dec 10 '15 at 18:08
  • Are you need to store your index in `df1` as datetime or you could convert it to date? If you could convert it to date then you could use `df1.loc[df2.index[0].date()]` – Anton Protopopov Dec 10 '15 at 18:28
  • I think we are using different versions of python: `df2.index[0] + 1` gives this error `ValueError: Cannot add integral value to Timestamp without offset.` – InsaneBot Dec 11 '15 at 14:01
  • while doing: `df1.loc[df2.index[0].date()]` gives this error `KeyError: 'the label [2015-11-04] is not in the [index]' ` – InsaneBot Dec 11 '15 at 14:02
  • I finally found somewhat a modified way to do this using `df1.loc[df2.index[4].strftime('%Y-%m-%d')]`(see my answer), but i feel it's very hackish – InsaneBot Dec 11 '15 at 14:03
2

So here is the thinking process that i did, in combination with @Anton Protopopov answer :

In [1]: df1.ix[df2]
# call trace
ValueError: Cannot index with multidimensional key

In [2]: df1.ix[df2.index]
out[2]:
SAMPLE_TIME       Bottom     Top      Out     state                                                                    
2015-10-17          NaN      NaN      NaN      NaN        
2015-08-29          NaN      NaN      NaN      NaN         
2015-11-19          NaN      NaN      NaN      NaN        
2015-11-07          NaN      NaN      NaN      NaN         

In [3]: df1.ix[df2.index[4:5]]
Out[3]: 
SAMPLE_TIME       Bottom     Top      Out     state                                                                    
2015-11-04           NaN      NaN      NaN      NaN     

In [33]: df1.loc[df2.index[4:5]]
KeyError: "None of [DatetimeIndex(['2015-11-04'], dtype='datetime64[ns]', name=u'SAMPLE_TIME', freq=None, tz=None)] are in the [index]"

Finally i gave up on ix and decided to make loc work, as Anton recommended i try:

In [4]: df1.loc[df2.index[0].date()]
KeyError: 'the label [2015-11-04] is not in the [index]'

Got me thinking that loc only accepts strings which finally worked:

In [5]: df1.loc[df2.index[4].strftime('%Y-%m-%d')]
Out[5]: 
SAMPLE_TIME              Bottom     Top      Out     state                                                                    
2015-11-04 00:00:22      56.256   56.300   43.750        0     
2015-11-04 00:01:22      56.256   56.300   43.812        0      
2015-11-04 00:02:22      56.256   56.300   43.812        0       
2015-11-04 00:03:22      56.256   56.300   43.812        0     
Community
  • 1
  • 1
InsaneBot
  • 2,422
  • 2
  • 19
  • 31
  • You could do `str` function to convert `date` to `str` with `str(df2.index[4].date())` which I think more straightforward then `strftime`. Btw which `python` and `pandas` versions are you using? I'm using `3.4.3` and `0.17.1`. – Anton Protopopov Dec 11 '15 at 16:16