I am trying to build a KNearest Neighbor system that will help me classify distances.
The columns from the original dataframe have columns totalDistance and Label.
To use KNN I have to encode the distances from totalDistance so I did the following:
data = pd.read_excel('/content/training set only distance.xlsx')
target = pd.read_excel('/content/testing set only distance.xlsx')
label_enc = preprocessing.LabelEncoder()
encoded_x = label_enc.fit_transform(data['TotalDistance'])
encoded_y = label_enc.fit_transform(target['TotalDistance'])
I wanted to bring back encoded numbers back to the original dataframe so I did the following:
data['encoded'] = encoded_x
target['encoded'] = encoded_y
X = data['label', 'encoded']
y = target['label', 'encoded']
This is giving me the following error on X:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3801 try:
-> 3802 return self._engine.get_loc(casted_key)
3803 except KeyError as err:
4 frames
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: ('label', 'encoded')
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3802 return self._engine.get_loc(casted_key)
3803 except KeyError as err:
-> 3804 raise KeyError(key) from err
3805 except TypeError:
3806 # If we have a listlike key, _check_indexing_error will raise
KeyError: ('label', 'encoded')
From research it seems like it's complaining that I am trying to use an incorrect index, however when I display the columns of the dataframes, the column encoded exists.
Index(['TotalDistance', 'Label', 'encoded'], dtype='object')
Index(['TotalDistance', 'Label', 'encoded'], dtype='object')
I tried accessing assigning the columns to x and y as:
X = data[['label', 'encoded']]
y = target[['label', 'encoded']]
But this also gives me an error.
What am I doing wrong?