python slicing does not give key error even when the column is missing

Question

I have a pandas dataframe with 10 keys. If I try to access a column that is not present, even then it returns a NaN for this. I was expecting a KeyError. How is pandas not able to identify the missing column ?

In the example below, vendor_id is a valid column in dataframe. The other column is absent from the dataset.

final_feature.ix[:,['vendor_id','this column is absent']]
Out[1017]: 
  vendor_id  this column is absent
0    434236                    NaN

type(final_feature)
Out[1016]: pandas.core.frame.DataFrame

EDIT 1: Validated that no null values are there

print (final_feature1.isnull().values.any())

score 1 · Answer 1 · answered May 11 '17 at 09:10

1

For me works select by subset:

final_feature[['vendor_id','this column is absent']]

KeyError: "['this column is absent'] not in index"

Also ix is deprecated in last version of pandas (0.20.1), check here.

answered May 11 '17 at 09:10

jezrael

822,522
95
1,334
1,252

should not it say "['this column is absent'] is not in columns ? – ForeverLearner May 11 '17 at 09:13
3

A dataframe's columns are an index (along axis 1). – IanS May 11 '17 at 09:13
Maybe yes, but I think it is general error for index and columns missing values - index on axis 0 is classic index and index on axis 1 is called columns. – jezrael May 11 '17 at 09:14
@IanS you are correct. Default (axis = 0) is the traditional index. but, columns are also index, just along axis = 1 in a dataframe. – ForeverLearner May 11 '17 at 10:07
@jezrael Thanks. – ForeverLearner May 11 '17 at 10:07

EdChum · Accepted Answer · 2017-05-11T09:17:12.287

This is expected behaviour and is due to the feature setting with enlargement

In [15]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df.ix[:,['a','d']]

Out[15]:
          a   d
0 -1.164349 NaN
1  0.400116 NaN
2 -0.599496 NaN
3  0.186837 NaN
4  0.385656 NaN

If you try df['d'] or df[['a','d']] then you will get a KeyError

Effectively what you're doing is reindexing, the fact the column doesn't exists when using ix doesn't matter, you'll just get a column of NaNs

Same behaviour is observed using loc:

In [24]:
df.loc[:,['a','d']]

Out[24]:
          a   d
0 -1.164349 NaN
1  0.400116 NaN
2 -0.599496 NaN
3  0.186837 NaN
4  0.385656 NaN

When you don't use ix or loc and try to do df['d'] you're trying to index a specific column or list of columns, there is no expectation of enlargement here unless you are assigning to a new column: e.g. df['d'] = some_new_vals

To guard against this you can validate your list using isin with the columns:

In [26]:
valid_cols = df.columns.isin(['a','d'])
df.ix[:, valid_cols]

Out[26]:
          a
0 -1.164349
1  0.400116
2 -0.599496
3  0.186837
4  0.385656

Now you will only see columns that exist, plus if you have mis-spelt any columns then it will also guard against this

Thank you so much. Do you suggest removing all instances of .ix from the code? A spelling mistake is how I ran into this issue — ForeverLearner, May 11 '17 at 09:50
It will work until some future version, from version 0.20.1 it's been marked for deprecation but it still works. The [docs](http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#whatsnew-0200-api-breaking-deprecate-ix) show how to achieve the same behaviour but the behaviour will still happen as I've demonstrated with `loc` but using `isin` against your existing columns will protect against this — EdChum, May 11 '17 at 09:53

python slicing does not give key error even when the column is missing

2 Answers2

Linked