.loc to filter database by date

Question

I have learning python over the past few weeks and I have an issue with the .loc function.

I have a dataframe (BAC) comprised of daily equity prices and dates as an index (which seems to be a datetime object). I want to filter only the dates in 2008 and do a rolling mean (30days) of the 'Close' column.

Here is my code and output :

BAC.info

OUT : <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 2769 entries, 2015-12-31 to 2005-01-03 Data columns (total 5 columns):
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2769 entries, 2015-12-31 to 2005-01-03
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Open    2769 non-null   float64
 1   High    2769 non-null   float64
 2   Low     2769 non-null   float64
 3   Close   2769 non-null   float64
 4   Volume  2768 non-null   float64
dtypes: float64(5)
memory usage: 209.8 KB

and :

BAC['Close'].loc['2008-01-01':'2009-01-01'].rolling(window=30).mean()

OUT: Series([], Name: Close, dtype: float64)

The code : BAC.loc['2008-01-01':'2009-01-01', 'Close' ].rolling(window=30).mean() yields the same result.

So, I don't get a mistake, but I think there is an issue with the format. The course I was following uses .ix, which has now been deprecated and I understood that .loc can do more or less the same thing (or .iloc if filter by column or row number).

After that, I tried :

BAC.loc['2008', 'Close' ].rolling(window=30).mean()

OUT : 
2008-12-31          NaN
2008-12-30          NaN
2008-12-29          NaN
2008-12-26          NaN
2008-12-24          NaN
                ...    
2008-01-08    35.948233
2008-01-07    35.858433
2008-01-04    35.775933
2008-01-03    35.705700
2008-01-02    35.656400
Name: Close, Length: 253, dtype: float64

So it works, but starts the rolling window at the END of 2008 and not the beginning...why is that so ?

Any help would be greatly appreciated. thanks !

score 0 · Answer 1 · answered Dec 18 '20 at 19:31

Your output from BAC.info shows that your index is reversed. Try to do a BAC.sort_index(inplace=True).

Other than that, I can't see any error in your code. This is what I get:

>>> import pandas as pd
>>> import numpy as np
>>> BAC = pd.date_range(start='2008-01-01', end='2009-12-31').to_frame()
>>> BAC['Close'] = np.random.randint(0, 70, BAC.shape[0])

>>> BAC['Close'].loc['2008-01-01':'2009-01-01'].rolling(window=30).mean()
2008-01-01          NaN
2008-01-02          NaN
2008-01-03          NaN
2008-01-04          NaN
2008-01-05          NaN
                ...    
2008-12-28    39.566667
2008-12-29    40.633333
2008-12-30    39.533333
2008-12-31    39.266667
2009-01-01    39.866667
Freq: D, Name: Close, Length: 367, dtype: float64
>>> BAC.loc['2008', 'Close'].rolling(window=30).mean()
2008-01-01          NaN
2008-01-02          NaN
2008-01-03          NaN
2008-01-04          NaN
2008-01-05          NaN
                ...    
2008-12-27    38.833333
2008-12-28    39.566667
2008-12-29    40.633333
2008-12-30    39.533333
2008-12-31    39.266667
Freq: D, Name: Close, Length: 366, dtype: float64

.loc to filter database by date

1 Answers1