1

I would like to insert a column into an existing DataFrame. Ideally without copying existing data. Whatever I try, later assignments to the resulting DataFrame generate a SettingWithCopyWarning, if the inserted data contains null values.

import pandas as pd

df = pd.DataFrame(data={'a': [1]})
df = df.assign(b=pd.Series(data=pd.NaT, index=df.index))
df['a'].iloc[0] = 5

Replacing assign with either of

df['b'] = pd.Series(data=pd.NaT, index=df.index)
df.insert(column='b', loc=0, value=pd.NaT)

results in the same warning.

Strange enough, if the inserted value is not null (replacing pd.NaT with, e.g., 0) doesn't generate a warning. Is that a bug?

Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
Konstantin
  • 2,451
  • 1
  • 24
  • 26

1 Answers1

1

Your issue seems to be with df['a'].iloc[0] = 5, where you're using chained assignment. Try this instead:

df.at[0, 'a'] = 5
# Or: df.loc[0, 'a'] = 5, but `.at` is preferred when assigning scalar
Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
  • Strange enough, the warning is not generated if I replace the null value with a non-null value. That is why I assumed that this chained assignment should work fine. Also, I would like to use position-based selection for rows and label-based selection for columns, which does not work with `at`, `loc`, `iat`, or `iloc`, AFAIK. and the `ix` is deprecated. – Konstantin Nov 18 '17 at 22:48
  • 3
    @Konstantin, for mixed (position + label) indexing use: `df.loc[df.index[position], 'column_name']` – MaxU - stand with Ukraine Nov 18 '17 at 22:50
  • Yeah, regarding `.ix` and combining position/label-based indexing: it might seem inconvenient, but `.ix` was deprecated _specifically for that reason_. The developers wanted to avoid ambiguity where possible--for example, if your index was [1, 0] and you used `.ix`, should this refer to a position or label? – Brad Solomon Nov 18 '17 at 22:50
  • @MaxU: Yes, I was aware of that possibility but found it ugly. `df[column_name].iloc[position]` is much more intuitive. But seems that `df.loc[df.index[position], 'column_name']` is the only way to avoid the warning. – Konstantin Nov 18 '17 at 22:52