I'm very new at using mlflow and I'm currently having some issues on its SparkTrials. I'm running the following code in my Jupyter notebook using Anaconda:
import mlflow
from hyperopt import hp, fmin, tpe, rand, SparkTrials, STATUS_OK, STATUS_FAIL, space_eval
# replicate input_pd dataframe to workers in Spark cluster
inputs = sc.broadcast(input_pd)
# configure hyperopt settings to distribute to all executors on workers
spark_trials = SparkTrials()
# select optimization algorithm
algo = tpe.suggest
# perform hyperparameter tuning (logging iterations to mlflow)
argmin = fmin(
fn=evaluate_model,
space=search_space,
algo=algo,
max_evals=100,
trials=spark_trials
)
# release the broadcast dataset
inputs.unpersist()
But, I get the following error:
Py4JError: An error occurred while calling o233.maxNumConcurrentTasks. Trace:
py4j.Py4JException: Method maxNumConcurrentTasks([]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:274)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Unknown Source)