0

I have broadcasted training dataset to all partitions. Now I want to share the information of 10 different hyperparameters/models to 10 different partitions and train them indepedently. How to share this modes/hyperparameters information ?

Is this right approach ?

broadcast_train = spark.sparkContext.broadcast(df_train)

models = [ LogisticRegression(), Ridge(), Lasso() ...... ] (10 such models)

model_list = spark.sparkContext.parallelize(models )` model_list = model_list.partitionBy(num_models, UniquePartitioner(num_models))

results_rdd = model_list.mapPartitions(run_model_on_partition)

def run_model_on_partition(model):     #read the broadcasted training data     #using model from rdd and run on training dataset and return the results.

thiran509
  • 11
  • 2

0 Answers0