I built a model in PySpark that is available on all worker nodes, but I would like to generate some model-related plots in the notebook. The model itself is available in a Spark-context, but it is not available in the notebook-context so that I can make plots. How do I make the model available in the notebook-context?
For more context, here is a shell of my PySpark code:
df = spark.read.parquet('path_to_my_data')
import h2o
h2o.init()
df_h2o = h2o.H2OFrame(df)
model = H2OGradientBoostingEstimator(ntrees=100)
model.train(x = ['var1', 'var2', 'var3'],
y = 'yvar',
training_frame = df_h2o)
# --- NOW I WANT TO MAKE PLOTS FOR THE MODEL ---
# This does not return a plot to the notebook
model.varimp_plot()
Of course, the code model.varimp_plot()
does not return a plot because the plot is being returned on the worker nodes as opposed to the notebook. I have seen a lot of people do something like this:
%%local
import matplotlib.pyplot as plt
a = [1,2,3,4,5]
plt.plot(a)
and I tried something like this for my problem:
%%local
import matplotlib.pyplot as plt
model.varimp_plot()
but it does not work since model
is only defined in a Spark-context (not the notebook context). The error I receive is NameError: name 'model' is not defined
.
My question: How do I access model
in the notebook context? Put another way, how do I access model
in the %%local
block of code?