2

I have a training script based on the AWS SageMaker RL example rl_network_compression_ray_custom but changed the env to make a basic gym env Asteroids-v0 (installing dependencies at main entrypoint to the training script). When I run the fit on the RLEstimator it gives the following error ray.tune.error.TuneError: No trainable specified! even if the run is specified in the training config as DQN.

Does anyone know about this issue and how to solve it?

Here is the longer log:

Running experiment with config {
  "training": {
    "env": "Asteroids-v0",
    "run": "DQN",
    "stop": {
      "training_iteration": 1
    },
    "local_dir": "/opt/ml/output/intermediate",
    "checkpoint_freq": 10,
    "config": {
      "double_q": false,
      "dueling": false,
      "num_atoms": 1,
      "noisy": false,
      "prioritized_replay": false,
      "n_step": 1,
      "target_network_update_freq": 8000,
      "lr": 6.25e-05,
      "adam_epsilon": 0.00015,
      "hiddens": [
        512
      ],
      "learning_starts": 20000,
      "buffer_size": 1000000,
      "sample_batch_size": 4,
      "train_batch_size": 32,
      "schedule_max_timesteps": 2000000,
      "exploration_final_eps": 0.01,
      "exploration_fraction": 0.1,
      "prioritized_replay_alpha": 0.5,
      "beta_annealing_fraction": 1.0,
      "final_prioritized_replay_beta": 1.0,
      "num_gpus": 0.2,
      "timesteps_per_iteration": 10000
    },
    "checkpoint_at_end": true
  },
  "trial_resources": {
    "cpu": 1,
    "extra_cpu": 3
  }
}
Important! Ray with version <=7.2 may report "Did not find checkpoint file" even if the experiment is actually restored successfully. If restoration is expected, please check "training_iteration" in the experiment info to confirm.
Traceback (most recent call last):
  File "train-ray.py", line 83, in <module>
    MyLauncher().train_main()
  File "/opt/ml/code/sagemaker_rl/ray_launcher.py", line 332, in train_main
    launcher.launch()
  File "/opt/ml/code/sagemaker_rl/ray_launcher.py", line 313, in launch
    run_experiments(experiment_config)
  File "/usr/local/lib/python3.6/dist-packages/ray/tune/tune.py", line 296, in run_experiments
    experiments = convert_to_experiment_list(experiments)
  File "/usr/local/lib/python3.6/dist-packages/ray/tune/experiment.py", line 199, in convert_to_experiment_list
    for name, spec in experiments.items()
  File "/usr/local/lib/python3.6/dist-packages/ray/tune/experiment.py", line 199, in <listcomp>
    for name, spec in experiments.items()
  File "/usr/local/lib/python3.6/dist-packages/ray/tune/experiment.py", line 122, in from_json
    raise TuneError("No trainable specified!")
ray.tune.error.TuneError: No trainable specified!
2020-04-22 13:21:15,784 sagemaker-containers ERROR    ExecuteUserScriptError:
Command "/usr/bin/python train-ray.py --rl.training.checkpoint_freq 1 --rl.training.stop.training_iteration 1 --s3_bucket XXXXX
MorRich
  • 426
  • 2
  • 5
  • 15

1 Answers1

2

the log indicated that the experiment config was not passed in appropriately. Could you try the roboschool example instead as the env is more straightforward, and provide the error log if it appears. Please make sure all dependencies are included in the Dockerfile to build customized image.

Anna Luo
  • 21
  • 2