I am trying to create a web application for predicting airline delays. I have trained my model offline on my computer, and now am trying to make a Flask app to make predictions based on user input. For simplicity, lets say my model has 3 categorical variables: UNIQUE_CARRIER, ORIGIN and DESTINATION. While training, I create dummy variables of all 3 using pandas:
df = pd.concat([df, pd.get_dummies(df['UNIQUE_CARRIER'], drop_first=True, prefix="UNIQUE_CARRIER")], axis=1)
df = pd.concat([df, pd.get_dummies(df['ORIGIN'], drop_first=True, prefix="ORIGIN")], axis=1)
df = pd.concat([df, pd.get_dummies(df['DEST'], drop_first=True, prefix="DEST")], axis=1)
df.drop(['UNIQUE_CARRIER', 'ORIGIN', 'DEST'], axis=1, inplace=True)
So now my feature vector is 297 long (assuming there are 100 different unique carriers and 100 different airports in my data). I saved my model using pickle, and now am trying to predict based on user input. Now the user input is in the form of 3 variables (origin, destination, carrier).
Obviously I cannot use pd.get_dummies
(because there would be only 1 unique value for all the three fields) for each user input. What is the most efficient way to convert the user input into the feature vector for my model?