0

I am trying to save a grid-searched PySpark TrainValidationSplitModel object, and while tuning the regularization of the logistic regression I'm getting the following strange error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-104-8e6b86f1e92c> in <module>
      1 # Save model, or upload if already saved
      2 if not os.path.isdir(drive_path + 'lr_2_model'):
----> 3     lr_2_model.save(drive_path + 'lr_2_model')
      4 else:
      5     lr_2_model = TrainValidationSplitModel.load(drive_path + 'lr_2_model')

5 frames
/content/spark-3.3.0-bin-hadoop3/python/pyspark/ml/tuning.py in meta_estimator_transfer_param_maps_to_java(pyEstimator, pyParamMaps)
    324                         break
    325                 if javaParam is None:
--> 326                     raise ValueError("Resolve param in estimatorParamMaps failed: " + str(pyParam))
    327                 if isinstance(pyValue, Params) and hasattr(pyValue, "_to_java"):
    328                     javaValue = cast(JavaParams, pyValue)._to_java()

ValueError: Resolve param in estimatorParamMaps failed: LogisticRegression_87f4bc317e0b__regParam

This is the code that caused the error. This code worked with a previous LogisticRegression PySpark model where I tuned the maxIter parameter.

# Save model, or upload if already saved
if not os.path.isdir(drive_path + 'lr_2_model'):
    lr_2_model.save(drive_path + 'lr_2_model')
else:
    lr_2_model = TrainValidationSplitModel.load(drive_path + 'lr_2_model')

This is the code where I defined lr_2_model (grid_search is a custom function I wrote. The error can't be with that as it's been working with other models):

# Run grid search
%%time
if not os.path.isdir(drive_path + 'lr_2_model'):
    lr_2_model = grid_search(stages_with_classifier=lr_2_stages, 
                             train_df=train_df_preprocessed, 
                             model_grid=lr_2_grid, 
                             parallelism=5)

And this is the code where I defined lr_2_grid, lr_2_stages, and lr_2.

lr_2 = LogisticRegression(
    featuresCol='scaled_features',
    labelCol='Anomalous',
    weightCol='Weight',
    standardization=False)

lr_2_stages = stages + [lr_2]

# Specify parameter grid
lr_2_grid = ParamGridBuilder()\
            .addGrid(lr_1.regParam, list(np.linspace(0.001, 0.1, 5)))\
            .build()
David Scholz
  • 8,421
  • 12
  • 19
  • 34
rjpost20
  • 1
  • 1

1 Answers1

0

I solved it. I was calling a previous LogisticRegression model in my ParamGridBuilder: .addGrid(**lr_1**.regParam, list(np.linspace(0.001, 0.1, 5)))\

facepalm

rjpost20
  • 1
  • 1