1

when running the following code :

import pandas as pd
df = pd.DataFrame({"A": [1,2,3],"B": [2,4,8]})
df2 = df[df["A"] < 3]
df2["C"] = 100

I get the following warning :

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

but this is exactly the behavior I want ( the real table is very big and I don't want to make copies of it), why do I get a warning ? why is it risky ?

df

   A  B
0  1  2
1  2  4
2  3  8

df2

   A  B    C
0  1  2  100
1  2  4  100
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
ohad
  • 13
  • 3

1 Answers1

1

Why does this happen?

Because df2 is a copy of a slice of df.

Why is it risky?

This is a message that tells you that df2 and df are different things. This was introduced because it wasn't always obvious that they were.

Take the example code from the docs:

def do_something(df):
   foo = df[['bar', 'baz']]  # Is foo a view? A copy? Nobody knows!
   # ... many lines here ...
   foo['quux'] = value       # We don't know whether this will modify df or not!
   return foo

How do I solve it?

Either by explicitly copying the slice:

df2 = df[df['A'] < 3].copy()
df2['C'] = 100

or by using loc:

df.loc[df['A'] < 3, 'C'] = 100
user3471881
  • 2,614
  • 3
  • 18
  • 34