0

I am trying to build a KNearest Neighbor system that will help me classify distances.

The columns from the original dataframe have columns totalDistance and Label.

To use KNN I have to encode the distances from totalDistance so I did the following:

data = pd.read_excel('/content/training set only distance.xlsx')
target = pd.read_excel('/content/testing set only distance.xlsx')

label_enc = preprocessing.LabelEncoder()
encoded_x = label_enc.fit_transform(data['TotalDistance'])
encoded_y = label_enc.fit_transform(target['TotalDistance'])

I wanted to bring back encoded numbers back to the original dataframe so I did the following:

data['encoded'] = encoded_x
target['encoded'] = encoded_y

X = data['label', 'encoded']
y = target['label', 'encoded']

This is giving me the following error on X:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3801             try:
-> 3802                 return self._engine.get_loc(casted_key)
   3803             except KeyError as err:

4 frames
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: ('label', 'encoded')

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3802                 return self._engine.get_loc(casted_key)
   3803             except KeyError as err:
-> 3804                 raise KeyError(key) from err
   3805             except TypeError:
   3806                 # If we have a listlike key, _check_indexing_error will raise

KeyError: ('label', 'encoded')

From research it seems like it's complaining that I am trying to use an incorrect index, however when I display the columns of the dataframes, the column encoded exists.

Index(['TotalDistance', 'Label', 'encoded'], dtype='object')
Index(['TotalDistance', 'Label', 'encoded'], dtype='object')

I tried accessing assigning the columns to x and y as:

X = data[['label', 'encoded']]
y = target[['label', 'encoded']]

But this also gives me an error.

What am I doing wrong?

0 Answers0