0

The following piece of code works as expected, with no warnings. I create a dataframe, create two sub-dataframes from it using .loc, give them the same index and then assign to a column of one of them.

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randn(20, 4),
                  index=pd.Index(range(20)),
                  columns=['one', 'two', 'three', 'four'])

d1 = df.loc[[2, 4, 6], :]
d2 = df.loc[[3, 5, 7], :]

idx = pd.Index(list('abc'), name='foo')
d1.index = idx
d2.index = idx

d1['one'] = d1['one'] - d2['two']

However, if I do exactly the same thing except with a multi-indexed dataframe, I get a SettingWithCopyWarning.

import numpy as np
import pandas as pd

arrays = [
    np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
    np.array(["one", "two", "one", "two", "one", "two", "one", "two"]),
]
df = pd.DataFrame(np.random.randn(8, 4), index=arrays, columns=['one', 'two', 'three', 'four'])

d1 = df.loc[(['bar', 'qux', 'foo'], 'one'), :]
d2 = df.loc[(['bar', 'qux', 'foo'], 'two'), :]

idx = pd.Index(list('abc'), name='foo')
d1.index = idx
d2.index = idx

d1['one'] = d1['one'] - d2['two']

I know that I can avoid this warning by using .copy() during the creation of df1 and df2, but I struggle to understand why this is necessary in the second case but not in the first. The chained indexing is equally present in both cases, isn't it? Also, the operation works in both cases (i.e. d1 is modified but df is not). So, what's the difference?

schtandard
  • 387
  • 4
  • 18

2 Answers2

2

I believe this falls into the internals of pandas. The decision to return a copy depends on several factors (dtype homogeneity,

What you can do is check whether or not you have a copy or view with _is_copy, and force one if needed:

def ensure_copy(df):
    if df._is_copy:
        return df.copy()
    return df

d1 = ensure_copy(df.loc[(['bar', 'qux', 'foo'], 'one'), :])
d2 = ensure_copy(df.loc[(['bar', 'qux', 'foo'], 'two'), :])

idx = pd.Index(list('abc'), name='foo')
d1.index = idx
d2.index = idx

d1['one'] = d1['one'] - d2['two']

Note that this is an internal pandas method, not a public one, so there is no guarantee that is will remain available in the future.

mozway
  • 194,879
  • 13
  • 39
  • 75
  • So you're saying that `pandas` creates a copy in the first case but a view in the second? But shouldn't `df` also be affected by the assignment in the second case, then? (It is not.) – schtandard Jan 24 '23 at 11:27
1

You have to use set_index to avoid the warning:

import numpy as np
import pandas as pd

arrays = [
    np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
    np.array(["one", "two", "one", "two", "one", "two", "one", "two"]),
]
df = pd.DataFrame(np.random.randn(8, 4), index=arrays, columns=['one', 'two', 'three', 'four'])

d1 = df.loc[(['bar', 'qux', 'foo'], 'one'), :]
d2 = df.loc[(['bar', 'qux', 'foo'], 'two'), :]

idx = pd.Index(list('abc'), name='foo')
d1 = d1.set_index(idx)  # <- HERE
d2 = d2.set_index(idx)  # <- HERE

d1['one'] = d1['one'] - d2['two']
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • I was about to add this to my answer ;) – mozway Jan 24 '23 at 10:48
  • 2
    Can you explain why? – jezrael Jan 24 '23 at 10:57
  • No I can't because I don't know. As @mozway said, it's difficult to understand why it doesn't work. If `set_index` work it's probably because it's the "official" way to set an index (rows) whereas it doesn't matter for the column index because each one is independent from others. I think it's a good issue to the GitHub of Pandas. – Corralien Jan 24 '23 at 11:06
  • Isn't that just another way of forcing a copy? If I use `d1.set_index(idx, inplace=True)`, the warning stays. – schtandard Jan 24 '23 at 11:09
  • @schtandard. You are probably right but `inplace` parameter will be deprecated. https://github.com/pandas-dev/pandas/issues/16529 => The parameter inplace=False should be deprecated across the board in preparation for pandas 2, which will not support that input (**we will always return a copy**). That would give people time to stop using it. – Corralien Jan 24 '23 at 12:37