0

This bug took forever to isolate, and, now that I have, I don't understand why it happens.

I am working in Jupyterlab. want to import a dataset, set two its columns to be a MultiIndex, filter by one of the index dimensions, and then add a new column to the resulting dataset.

The following works in a single cell without a problem:

stockdata = pd.read_csv('data/stockdata.csv',parse_dates=['timestamp'])
stockdata.sort_values(['ticker','timestamp'],inplace=True)
stockdata.set_index(['ticker','timestamp'],inplace=True)
idx = pd.IndexSlice
stockdata=stockdata.loc[idx[:,pd.Timestamp(2000,7,31):pd.Timestamp(2021,10,29)],:]
stockdata['return'] = 0

If I split the code into two Jupyter cells so I can take a look at the imported data before proceeding, and run both cells sequentially, I get identical results:

stockdata = pd.read_csv('data/stockdata.csv',parse_dates=['timestamp'])
stockdata.head()
stockdata.sort_values(['ticker','timestamp'],inplace=True)
stockdata.set_index(['ticker','timestamp'],inplace=True)
idx = pd.IndexSlice
stockdata=stockdata.loc[idx[:,pd.Timestamp(2000,7,31):pd.Timestamp(2021,10,29)],:]
stockdata['return'] = 0

But if I replace stockdata.head() with just stockdata above (because it's a quick way to see both the head and the tail of the dataframe while still getting nice Jupyter formatting for column names), I get the dreaded SettingWithCopyWarning in the last line: stockdata['return'] = 0

What is it about displaying data by just typing the dataset name that causes this?

Vadim
  • 3
  • 2

0 Answers0