I work in Python. I have a problem with the categorical variable - "city".
I'm building a predictive model on a large dataset-over 1 million rows. I have over 100 features. One of them is "city", consisting of 33 000 different cities.
I use e.g. XGBoost where I need to convert categorical variables into numeric. Dummifying causes the number of features to increase strongly. XGBoost (and my 20 gb RAM) can't handle this.
Is there any other way to deal with this variable than e.g. One Hot Encoding, dummies etc.? (When using One Hot Encoding e.g., I have performance problems, there are too many features in my model and I'm running out of memory.)
Is there any way to deal with this?