2

I have a tuning step in my sagemaker pipeline, in the following step i'm using train.py script inside the tuning step container. Inside the train.py script i'm using imported module called 'dill'. It seems that the sagemaker SKLearn container didn't install the requirements as it supposed to. Running the pipeline followed with importing error: ModuleNotFoundError: no module named 'dill'

My tuning step container:

 sk_estimator = SKLearn(
 entry_point="train.py",
 role=role,
 instance_count=1,
 instance_type="ml.c5.xlarge",
 source_dir="custom-model-sklearn/src/",
 hyperparameters={
     "target_col":'target_col',
     "penalty": 'none',
     "fit_intercept": True,
     "solver": 'lbfgs',
     "verbose": 0,
     "C": 1,
 },
 py_version="py3",
 framework_version="1.0-1",
 script_mode=True,
 sagemaker_session=pipeline_session,
 disable_profiler=True,
 output_path = "s3://{}/{}/TrainingStep".format(bucket,model_prefix)
)

base_job_name = f'sklearn-model'

The train.py script and the requirements.txt file which contains dill are inside the directory - /custom-model-sklearn/src.

train.py:

    import ...
    import ...
    .
    .
    import dill
    .
    .

requirements.txt:

   dill

It seems that the source_dir is configured correctly due to the fact that the error is in the train.py script.

Currently i'm moving my code from one account to the another. In the previous account I did the same thing with the same hierarchy of directories and it did manage to install the module inside the tuning container.

Any help would be appreciated.

Ron Fisher
  • 43
  • 5

1 Answers1

0

So it seems to use third parties libraries with Sagemaker estimator.

You can refer to this link to learn more about the process.

But basically with your code, add your requirements.txt file under the custom-model-sklearn/src/ and the estimator will install you required requirement when building the docker container.

Espoir Murhabazi
  • 5,973
  • 5
  • 42
  • 73