0

I am running notebooks in Sagemaker Studio

When I create a notebook and run it from stage studio, I execute from a directory which corresponds to what I see on the left sidebar

    import os
    print("getcwd", os.getcwd())

getcwd /root/test

However, when I schedule the same notebook using the UI enter image description here

the job executes from /opt/ml/input/data/sagemaker_headless_execution

That directory contains the notebook I am running, but nothing else enter image description here

On my terminal, I can navigate to /home/sagemaker-user/mydirectory but when I do this in the notebook /home is empty

My notebook needs access to certain files stored in the local directory. How do I mount or attach thm?

I can just input and output everything through boto or sqlalchemy, but if so, what is the point of Sagemaker having a file system. It also means the flow which works when the notebook is run from within the UI or locally breaks down when run on a schedule which seems wrong.

rightsized
  • 130
  • 1
  • 8

1 Answers1

1

Notebook jobs use training jobs in the backend - so you'll have to have any additional files (other than your notebook) in S3 (or other accessible location) to access them in the headless training job. Studio file system is not mounted to the training job.

durga_sury
  • 869
  • 4
  • 6
  • thanks, @durga_sury can you elaborate on what training jobs are, why they are needed, and how they are used. And what is the purpose of the Sagemaker file system if jobs cannot access it – rightsized Jul 27 '23 at 00:34
  • Here's the documentation - https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-training.html Essentially, training jobs are ephemeral - spins up EC2 instance(s), copies your script and data, runs the training, stores model back to S3 and shuts down the instance. So you only pay for what you use. Think of the file system on Studio as your local environment for development, like a folder on your laptop. It persists storage across Studio restarts, mounts the same file system on any instance (notebook) that you spin up on Studio. – durga_sury Jul 28 '23 at 21:10