0

Could I ask,

When I run pytorch lightning with ray HPO it creates a lot of directories like this:

'train_fn_ff307114_1350_activation_function=nn_ReLU_inplace_True,c_hidden=64,dp_rate_linear=0.6747,learning_rate=0.0002,num_layers=_2022-12-21_17-51-28'
'train_fn_ff4d9f16_599_activation_function=nn_LeakyReLU_inplace_True,c_hidden=2056,dp_rate_linear=0.5871,learning_rate=0.0035,num_l_2022-12-21_13-26-00'
'train_fn_ffa6c2a4_892_activation_function=nn_ReLU_inplace_True,c_hidden=512,dp_rate_linear=0.6173,learning_rate=0.0116,num_layers=_2022-12-21_15-06-17'
'train_fn_ffe7cd72_1684_activation_function=nn_Tanh,c_hidden=2056,dp_rate_linear=0.6520,learning_rate=0.0032,num_layers=4,optimizer_2022-12-21_19-45-25'

Each dir containing a checkpoint, log etc.

Is there a way to optimize this for space? For example, if a directory is found to have a better val_acc (my metric of interest for HPO) then we don't need to keep the less better val_acc HPO attempt, so can I delete these as I go through the HPO process, so I'm only left with the data for the best HPO attempt at the end (and there is never an excessively large amount of info saved)?

My code for running HPO is:


def run_ray(metric='val_acc', mode='max',num_samples=4000,config_dict={}, checkpoint_file_name="full_run_2_ray_ckpt",config_file='full_run_2/best_config.txt',local_dir='full_run_2/runs/'):

    hyperopt_search = HyperOptSearch(metric=metric, mode=mode)

    #change from gpu {"gpu": 1}
    tuner = tune.Tuner(tune.with_resources(train_fn,{"gpu": 1}), tune_config=tune.TuneConfig(num_samples=num_samples,search_alg=hyperopt_search),param_space=config_dict,run_config= RunConfig(local_dir=local_dir))
    results = tuner.fit()   
    best_result = results.get_best_result(metric=metric, mode=mode) 

    config_file = open(config_file, 'a')
    config_file.write(str(best_result.config) + '\n')

    best_checkpoint = best_result.checkpoint
    path = os.path.join(str(best_checkpoint.to_directory()), checkpoint_file_name)
    print(path)

    model = GraphLevelGNN.load_from_checkpoint(path)
    config_file.write(str(best_result.log_dir))
    config_file.close()

    return best_result.log_dir,model
Slowat_Kela
  • 1,377
  • 2
  • 22
  • 60

0 Answers0