0

I have followed the steps described in Cartpool notebook, but when I come to training the cartpole agent running the following cell:

from azureml.core import RunConfiguration, ScriptRunConfig, Experiment
from azureml.core.runconfig import DockerConfiguration, RunConfiguration

training_algorithm = "PPO"
rl_environment = "CartPole-v0"
video_capture = True
if video_capture:
    algorithm_config = '\'{"num_gpus": 0, "num_workers": 1, "monitor": true}\''
else:
    algorithm_config = '\'{"num_gpus": 0, "num_workers": 1, "monitor": false}\''

script_name = 'cartpole_training.py'
script_arguments = [
    '--run', training_algorithm,
    '--env', rl_environment,
    '--stop', '\'{"episode_reward_mean": 200, "time_total_s": 300}\'',
    '--config', algorithm_config,
    '--checkpoint-freq', '2',
    '--checkpoint-at-end',
    '--local-dir', './logs'
]

ray_environment = Environment.get(ws, name=ray_environment_name)
run_config = RunConfiguration(communicator='OpenMpi')
run_config.target = compute_target
run_config.node_count = 1
run_config.environment = ray_environment
command=["python", script_name, *script_arguments]

if video_capture:
    command = ["xvfb-run -s '-screen 0 640x480x16 -ac +extension GLX +render' "] + command
    run_config.environment_variables["SDL_VIDEODRIVER"] = "dummy"

training_config = ScriptRunConfig(source_directory='./files',
            command=command,
            run_config = run_config
            )

training_run = experiment.submit(training_config)

I get the following error message:

FileNotFoundError: [Errno 2] No such file or directory: '/mnt/batch/tasks/shared/LS_root/mounts/clusters/xxx/code/Users/yyy/files'

Do you get what is missing?

nize
  • 1,012
  • 1
  • 11
  • 27

1 Answers1

0

When the build fails of the environment variables creation, this error will occur. Instead of connecting the environment from the existing environment, we need to create the docker file.

enter image description here

enter image description here

The below code will be generated.

FROM mcr.microsoft.com/azureml/openmpi3.1.2-cuda10.2-cudnn8-ubuntu18.04
RUN pip install azureml-mlflow

ray_env_build_details.wait_for_completion(show_output=True)

After building the docker file if the case is successful go with the file location of the data and file and replace with the existing path.

enter image description here

If the screen is getting like above screen of the error, then the specific file path and directory error will be notified.

enter image description here

If the succeeded information is achieved, then we can access the data from the specific cluster. It will solve the error.

Sairam Tadepalli
  • 1,563
  • 1
  • 3
  • 11
  • Thanks, but sorry, I have some difficulties following the answer, e.g. I see the code `FROM ... RUN ...` generated, but not the `ray_env_build ...`. Should I replace `'cartpole-ray-sc'` with `nov999` in the cell containing `ray_environment_name = 'cartpole-ray-sc'`? It would be great the entire solution could be captured in a notebook. Shouldn't the notebook be reproducible exactly like it is? – nize Nov 09 '22 at 20:30
  • yes, replace with your own name. I kept as nov999 as trying to solve this post on nov9th. Replace the name with your own requirements. – Sairam Tadepalli Nov 10 '22 at 02:34
  • Ok, I did like that, but I got the same error anyway. Probably I don't understand the resolution fully. – nize Nov 10 '22 at 19:10
  • Once, try to execute the complete procedure again. – Sairam Tadepalli Nov 10 '22 at 19:33