The following piece of code works as expected, with no warnings. I create a dataframe, create two sub-dataframes from it using .loc
, give them the same index and then assign to a column of one of them.
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randn(20, 4),
index=pd.Index(range(20)),
columns=['one', 'two', 'three', 'four'])
d1 = df.loc[[2, 4, 6], :]
d2 = df.loc[[3, 5, 7], :]
idx = pd.Index(list('abc'), name='foo')
d1.index = idx
d2.index = idx
d1['one'] = d1['one'] - d2['two']
However, if I do exactly the same thing except with a multi-indexed dataframe, I get a SettingWithCopyWarning
.
import numpy as np
import pandas as pd
arrays = [
np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
np.array(["one", "two", "one", "two", "one", "two", "one", "two"]),
]
df = pd.DataFrame(np.random.randn(8, 4), index=arrays, columns=['one', 'two', 'three', 'four'])
d1 = df.loc[(['bar', 'qux', 'foo'], 'one'), :]
d2 = df.loc[(['bar', 'qux', 'foo'], 'two'), :]
idx = pd.Index(list('abc'), name='foo')
d1.index = idx
d2.index = idx
d1['one'] = d1['one'] - d2['two']
I know that I can avoid this warning by using .copy()
during the creation of df1
and df2
, but I struggle to understand why this is necessary in the second case but not in the first. The chained indexing is equally present in both cases, isn't it? Also, the operation works in both cases (i.e. d1
is modified but df
is not). So, what's the difference?