11

I don't think I'm asking this question right but I have jupyter notebook that launches a Tensorflow training job with a python training script I wrote.

That training script requires certain modules. Seems my sagemaker training job is failing because some of the modules don't exist.

How can I ensure that my training job script has all the modules it needs?

Edit

An example of one of these modules is keras.

The odd thing is, I can import keras in the jupyter notebook, but when that import statement is in my training script then I get the No module named keras error

kane
  • 5,465
  • 6
  • 44
  • 72

6 Answers6

5

If you want to install multiple packages, one way is to upgrade to Sagemaker Python SDK v2. With this, you can create a requirements.txt in the same directory as your notebook, and run the training. Sagemaker will automatically take care of the installation.

If you want to stay on v1 SDK, you can add the following snippet to your entry_point script.

import subprocess
import sys

def install(package):
    subprocess.check_call([sys.executable, "-q", "-m", "pip", "install", package])
    
install('keras')
Inderpartap Cheema
  • 463
  • 1
  • 7
  • 17
3

The module script runs within a docker container which obviously does not have the dependency installed. Jupyter notebook on the other hand has keras pre-installed. Easy way to do this is to have a requirements.txt file with all the requirements and then pass that on when creating your model.

env = {
    'SAGEMAKER_REQUIREMENTS': 'requirements.txt', # path relative to `source_dir` below.
}
sagemaker_model = TensorFlowModel(model_data = 's3://mybucket/modelTarFile,
                                  role = role,
                                  entry_point = 'entry.py',
                                  code_location = 's3://mybucket/runtime-code/',
                                  source_dir = 'src',
                                  env = env,
                                  name = 'model_name',
                                  sagemaker_session = sagemaker_session,
                                 )
Raman
  • 643
  • 5
  • 6
  • What is TensorFlowModel? Is that the same as sagemaker.tensorflow? I am not able to pass an env arg to it – kane Nov 30 '18 at 01:52
  • TensorFlowModel is the high level Python class specified in the SageMaker Python SDK. It is responsible encapsulating metadata needed for calling SageMaker and deploying a model there. https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/model.py#L41 – ByungWook Dec 18 '18 at 00:29
  • Thanks this works like a charm? Do you know if there's any documentation to the `env` and `SAGEMAKER_REQUIRMENTS`? – alvas Apr 15 '23 at 00:17
2
  1. You can upload your requirements.txt file to s3 bucket which can be accessible by sagemaker and download the file to your working directory of the container using boto3. Install the libraries from requirements.txt the entry file.

        import os
        import boto3
    
        s3 = boto3.client('s3')
        s3.download_file('BUCKET_NAME', 'OBJECT_NAME', '/opt/ml/code/requirements.txt')
        os.command('pip install -r /opt/ml/code/requirements.txt')
    
  2. The other way you can do it is by building your own container using bring your own algorithm option provided by aws.

Ref-links:
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/scikit_bring_your_own/scikit_bring_your_own.ipynb

user3415910
  • 440
  • 3
  • 5
  • 19
2

The EstimatorBase class (and TensorFlow class) accept the parameter dependencies which you can use as follows to pass your requirements.txt:

estimator = TensorFlow(
    dependencies=['requirements.txt'],  # copies this file
)

e.g.

estimator = TensorFlow(
    entry_point='src/train.py',
    dependencies=['requirements.txt'],  # copies this file
)

or

estimator = TensorFlow(
    source_dir='src',  # this copies the entire src folder
    entry_point='train.py',  # when using source_dir has to be directly under that dir
    dependencies=['requirements.txt'],  # copies this file
)

This copies the requirements.txt file into your sourcedir.tar.gz along with the training code.

  • This may only work on newer image versions. I read that in older versions you may need to put the requirements.txt file in the same folder as your training code.

If this doesn't work, you can use pip download to download your dependencies defined in requirements.txt locally, then use the dependencies parameter to specify the folder to which you downloaded your dependencies.

Danny Varod
  • 17,324
  • 5
  • 69
  • 111
1

Another option is in your entry_point .py file you can add

import os

if __name__ == "__main__":
    os.system('pip install mymodule')
    import mymodule
    # rest of code goes here

This worked for me for simple modules such as pyparsing, but I think with keras you better just use a Tensorflow container that has keras preinstalled, as mentioned above.

-1

The environment on your notebook instance is exclusive from the environment of your training job on SageMaker, unless it is local mode.

If you're using a custom docker image, then most likely your docker image doesn't have Keras installed.

If you are using the SageMaker predefined TensorFlow container, which is most likely invoked through the following code:

https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/estimator.py#L170

TensorFlow(entry_point='training_code.py',
           blah,
           blah
          )

Then you will need to install your dependencies within that container. There are currently two modes for training for TensorFlow on SageMaker, "framework" and "script" mode.


If training through "framework" mode, which is only available with 1.12 and below, then you will be limited to using a keras_model_fn defined here: https://github.com/aws/sagemaker-python-sdk/tree/v1.12.0/src/sagemaker/tensorflow#preparing-the-tensorflow-training-script

Installing your dependencies would be done by passing in a requirements.txt.


On "script mode", which is introduced with TensorFlow 1.11 and above: https://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/tensorflow#training-with-tensorflow

Requirements.txt is not supported for "script" mode and instead it is recommended to install your dependencies within your user script, which would be your Python file that contains all of your Keras code.

Please let me know if there is anything I can clarify.

For examples:

ByungWook
  • 374
  • 1
  • 4