"ValueError: A given column is not a column of the dataframe" when trying to convert categorical feature into numerical

Question

I am using a csv file from a Udemy course for the sake of training. I only want to use age and country columns to keep things simple. Here is the code:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.compose import ColumnTransformer as ct
from sklearn.model_selection import train_test_split as tts

data = pd.read_csv("advertising.csv")

X = data[["Age","Country"]]
y = data[["Clicked on Ad"]]


from sklearn.preprocessing import OneHotEncoder
cat = X["Country"]
one_hot = OneHotEncoder()
transformer = ct([("one_hot", one_hot, cat)],remainder="passthrough")
transformed_X = transformer.fit_transform(X)

print(transformed_X)

I get this error:

runfile('C:/Users/--/.spyder-py3/untitled0.py', wdir='C:/Users/--/.spyder-py3')
Traceback (most recent call last):

  File "C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py", line 2895, in get_loc
    return self._engine.get_loc(casted_key)

  File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item

  File "pandas\_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'Tunisia'


The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "C:\Anaconda\lib\site-packages\sklearn\utils\__init__.py", line 447, in _get_column_indices
    col_idx = all_columns.get_loc(col)

  File "C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py", line 2897, in get_loc
    raise KeyError(key) from err

KeyError: 'Tunisia'


The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "C:\Users\--\.spyder-py3\untitled0.py", line 17, in <module>
    transformed_X = transformer.fit_transform(X)

  File "C:\Anaconda\lib\site-packages\sklearn\compose\_column_transformer.py", line 529, in fit_transform
    self._validate_remainder(X)

  File "C:\Anaconda\lib\site-packages\sklearn\compose\_column_transformer.py", line 327, in _validate_remainder
    cols.extend(_get_column_indices(X, columns))

  File "C:\Anaconda\lib\site-packages\sklearn\utils\__init__.py", line 454, in _get_column_indices
    raise ValueError(

ValueError: A given column is not a column of the dataframe

"Tunisia" is the first country under the column of "Country"

What might have caused the problem?

Thank you in advance.

afsharov · Accepted Answer · 2021-05-19T14:58:04.497

2

The problem occurs because you are not specifying the column to transform correctly. In this line:

transformer = ct([("one_hot", one_hot, cat)],remainder="passthrough")

cat should stand for the index or the name of the column you want to transform. However, you are passing a whole dataframe because you set cat = X["Country"].

To fix this issue, just use one of the follwing:

#option 1
cat = ['Country']

# option 2
cat = [1]

and it should work fine.

edited May 19 '21 at 14:58

answered May 19 '21 at 14:49

afsharov

4,774
2
10
27

1

Pass the index instead. Updated the answer. – afsharov May 19 '21 at 14:54
Thanks! After changing to cat = [1] it worked. Why didn't cat = "Country" work by the way? – cagatay.e.sahin May 19 '21 at 14:57
1

It didn't work because your dataframe is a 2d array-like. So you have to pass the index or name in a list or array-like as well. I have included both options in the answer. If it was a 1d array-like (i.a. a vector), `cat='Country'` would work. – afsharov May 19 '21 at 15:01

"ValueError: A given column is not a column of the dataframe" when trying to convert categorical feature into numerical

1 Answers1