I have downloaded a trained model from Azure Machine Learning. It was trained with Automated ML, using the Time Series forecasting preset.
When I want to run predictions, I get this message:
NumericalizeTransformer: Column AircraftModel contains categories not present at fit: {('42',)}. These categories will be set to NA prior to encoding.
.format(col, new_cats))
Column Operator contains categories not present at fit: {('US Airlines',)}. These categories will be set to NA prior to encoding.
.format(col, new_cats))
My code for running forecast is this:
def load_model():
global model
model_path = 'model.pkl'
model = joblib.load(model_path)
def run_forecast(data):
try:
y_query = data.pop('y_query').values
#y_query.fill(np.nan)
result = model.forecast(data, y_query)
except Exception as e:
result = str(e)
return json.dumps({"error": result})
forecast_as_list = result[0].tolist()
return forecast_as_list
input_sample = pd.DataFrame(data=[{'AircraftId': 'ATR-0001', 'FromDate': '2016-09-01T00:00:00.000Z', 'AircraftModel': '42', 'Operator': 'US Airlines', 'Country': 'Denmark', 'MonthOfYear': 9, 'y_query': 1.0}])
load_model()
forecast = run_forecast(input)
I get a result returned, however it is quite bad and I suspect the omitted feature columns is the culprit.
Should I manually do some pre-processing before running inference on the model?