I have an easy task, but I'm not able to solve my problem.
I have a huge Dataframe and want to execute a KNN, but can't do that since I get following Error:
Error: factor predictors must have at most 32 levels
So far so good.. My Idea was to aggregate the column, so I get less Factors.
str(only_savings_medium$MaterialGroupCode)
Factor w/ 40 levels "1A","1B","1C",..: 11 11 11 15 15 15 15 15 15 15 ...
There are 40 levels of "Codes" in form of "1A", "1B", ..., "2B", "2D", ..., "3A",... "3D", "4B", "4C",..., "5A", .., "5Z". Basically I want to check whether the factor contains a 1,2,3,4 or 5 and assign that to the new column. All Codes with 1(any letter) would be assigned to 1, 2(any letter) to 2 and so on. In the end, there should be a new column with only 5 factors, each containing all smaller factors. I'm not sure how to explain that and hope that you understand my problem.
Edit: I'll try to expand my explanation. Here is s a part of the dataframe:
As you can see, there is a Column with different Material Group Codes. There are 40 levels. What I need: create new column for this dataframe. This column contains 5 levels (1,2,3,4 or 5). If we take the example of my screenshot - we would have a new coulmn with following levels: 2,2,2,2,2,1,1,1,1,1,1,3,3,3,3,3 ..., 3. Basically every 1A - 1Z, gets assigned to level 1 of the new column, every 2A - 2Z gets assigned to 2 and so on..