I'm trying to build my own stocks screener with pandas. I fetch data with the DataReader, from Morningstar. Here is my data structure :
- 'Close', 'Volume', 'Low', 'High' etc are columns;
- 'Date' are rows. I could be wrong, but I have chosen to index with 'Symbol' (AAPL, WMLP, etc), and 'Date' are a sublevel of index.
When I add columns to the right, with some stock metrics ('Daily Return', 'Rolling Mean 20', etc), I have a major problem : instead of writing a Nan value for calculation that cannot be made because there is no previous data to use, python uses the last data of the previous stock. See below the Daily Return of WMLP on 2018-07-02 :
df.reset_index(inplace=True)
df.set_index(['Symbol', 'Date'])
Close High Low Open Volume \
Symbol Date
AAPL 2018-07-02 187.1800 187.300 183.4200 183.8200 17731343
2018-07-03 183.9200 187.950 183.5400 187.7900 13954806
2018-07-04 183.9200 183.920 183.9200 183.9200 0
2018-07-05 185.4000 186.410 184.2800 185.2600 16604247
2018-07-06 187.9700 188.434 185.2000 185.4200 17485245
WMLP 2018-07-02 1.3700 1.520 1.3300 1.3700 9139
2018-07-03 1.4000 1.690 1.3800 1.6900 8075
2018-07-04 1.4000 1.400 1.4000 1.4000 0
2018-07-05 1.6000 1.600 1.4124 1.4124 29383
2018-07-06 1.7095 1.770 1.5200 1.6000 42599
Daily Return
Symbol Date
AAPL 2018-07-02 NaN
2018-07-03 -0.017416
2018-07-04 0.000000
2018-07-05 0.008047
2018-07-06 0.013862
WMLP 2018-07-02 -0.992712
2018-07-03 0.021898
2018-07-04 0.000000
2018-07-05 0.142857
2018-07-06 0.068437
I want the Daily Return of WMLP on 2018-07-02 to be NaN, without having to manually force it.
Is it a problem with my indexing method ? Or would you suggest a smarter data structure to avoid the problem ? Note that I understand that panels are deprecated, and I have not learned xarray yet.
EDIT Per you request, please find below the other bits of code that came in the beginning :
start = '2018-07-02'
end = '2018-07-07'
WMLP = web.DataReader('WMLP', 'morningstar', start, end)
AAPL = web.DataReader('AAPL', 'morningstar', start, end)
frames = [AAPL, WMLP]
df = pd.concat(frames)
df['Daily Return'] = df['Close'].pct_change(1)
EDIT : Following the response from Peter (which works), I tried to put the groupby
just one time in the beginning - in order to make the later code more simple and readable :
df = df.groupby('Symbol')
df['Daily Return'] = df['Close'].pct_change(1)
df['Rolling 20'] = df['Close'].rolling(20).mean()
However this produces "TypeError: 'DataFrameGroupBy' object does not support item assignment"
I am grateful for the groupby
solution, but would welcome other ideas