0

I built a model in PySpark that is available on all worker nodes, but I would like to generate some model-related plots in the notebook. The model itself is available in a Spark-context, but it is not available in the notebook-context so that I can make plots. How do I make the model available in the notebook-context?

For more context, here is a shell of my PySpark code:

df = spark.read.parquet('path_to_my_data')
import h2o
h2o.init()
df_h2o = h2o.H2OFrame(df)
model = H2OGradientBoostingEstimator(ntrees=100)
model.train(x = ['var1', 'var2', 'var3'], 
            y = 'yvar', 
            training_frame = df_h2o)
# --- NOW I WANT TO MAKE PLOTS FOR THE MODEL ---
# This does not return a plot to the notebook
model.varimp_plot()

Of course, the code model.varimp_plot() does not return a plot because the plot is being returned on the worker nodes as opposed to the notebook. I have seen a lot of people do something like this:

%%local 
import matplotlib.pyplot as plt
a = [1,2,3,4,5]
plt.plot(a)

and I tried something like this for my problem:

%%local
import matplotlib.pyplot as plt
model.varimp_plot()

but it does not work since model is only defined in a Spark-context (not the notebook context). The error I receive is NameError: name 'model' is not defined.

My question: How do I access model in the notebook context? Put another way, how do I access model in the %%local block of code?

gm1991
  • 163
  • 1
  • 7

0 Answers0