the goal:
Hey guys, I'm trying to create a classification model in Python to predict when a bike-share station will have too much relative inflow or outflow per hour.
what we're workin with:
The first 5 rows of my dataframe (over 200,000 rows in all) look like this, and I've assigned values 0, 1, 2 in the 'flux' column - 0 if no significant action, 1 if too much inflow, 2 if too much outflow.
And I'm thinking of using the station_name (over 300 stations), hour of day, and day of week as the predictor variables to classify 'flux'.
the model choice:
What should I go with? Naive Bayes? KNN? Random Forest? anything else that would be a good fit? GDMs? SVMs?
fyi: the baseline prediction of always 0 is pretty high at 92.8%. unfortunately the accuracy of logistic regression and decision tree is right on par w that and doesn't improve it much. and KNN just takes forever....
Recommendations from those more experienced with machine learning in dealing with a classification question like this?