1

I have the sample code and want to run the net evaluation in parallel. I have a Desktop with four cores and it can run two threads per core, so I want to have 8 parameter settings running in parallel. The layers/hypermodels are in a different file, but I don't think they are necessary for you to see, let me know if they are.

## train/test split
ii = np.arange(AI.shape[0])
ii_train,ii_test = train_test_split(ii,random_state=1)
AItrain,AItest = AI[ii_train],AI[ii_test]
ATtrain,ATtest = AT[ii_train],AT[ii_test]
YYtrain,YYtest = YY[ii_train],YY[ii_test]
# input/output scaling
scalerAI = scalerClass(AItrain)
AItrain,AItest = scalerAI.scale(AItrain), scalerAI.scale(AItest)
scalerAT = scalerClass(ATtrain)
ATtrain,ATtest = scalerAT.scale(ATtrain), scalerAT.scale(ATtest)
scalerYY = scalerClass(YYtrain)
YYtrain,YYtest = scalerYY.scale(YYtrain), scalerYY.scale(YYtest)
## randomSearch
tuner = kt.RandomSearch(
    scalarRSmodel(),
    objective="loss",
    max_trials=10,
    overwrite=False,
    directory="tunerResults",
    project_name="tune_hypermodel",
    distribution_strategy=tf.distribute.MirroredStrategy()
)
tuner.search(AItrain,ATtrain,YYtrain, epochs=500, validation_data=([AItest,ATtest],YYtest))

Then I tried to run with the bash script given in the help page, but I seem to be doing something wrong.

export KERASTUNER_TUNER_ID="chief"  #or "tuner0", "tuner1",...
export KERASTUNER_ORACLE_IP="127.0.0.1"
export KERASTUNER_ORACLE_PORT="8000"
python3.10 main.py

I also tried running several ones of those (one chief and more workers) from different terminals but that didn't work as it kept evaluating the same parameters. I looked at the TF distribution strategy help page as well, but I didn't understand anything. So how can I fix this? I would also be fine with running the gradient procedure in parallel if that is easier, so solving one parameters set at a time, but in parallel.

martin
  • 11
  • 2

0 Answers0