I'm building multiple Prophet models where each model is passed to a pandas_udf function which trains the model and stores the results with MLflow.
@pandas_udf(result_schema, PandasUDFType.GROUPED_MAP)
def forecast(data):
......
with mlflow.start_run() as run:
......
Then I call this UDF which trains a model for each KPI.
df.groupBy('KPI').apply(forecast)
The idea is that, for each KPI a model will be trained with multiple hyperparameters and store the best params for each model in MLflow. I would like to use Hyperopt to make the search more efficient.
In this case, where should I place the objective function? Since the data is passed to the UDF for each model I thought of creating an inner function within the UDF that uses the data for each run. Does this make sense?