0

I'm trying to dev locally on sagemaker.huggingface.HuggingFace before moving to sagemaker for actual training. I set up a

HF_estimator = HuggingFace(entry_point='train.py', instance_type='local' ...)

And called HF_estimator.fit()

In train.py im simply printing and exiting to see if it will work. However I ran into this:

ValueError: Unsupported processor: cpu. You may need to upgrade your SDK version (pip install -U sagemaker) for newer processors. Supported processor(s): gpu.

Is it possible to bypass this for local development?

plamb
  • 5,636
  • 1
  • 18
  • 31

1 Answers1

0

This error happens at the point the SDK tries to look up an eligible container image and finds that (unlike other frameworks like base PyTorch), HF only offers CUDA-enabled DLC images.

Maybe (I haven't checked but would be interested to know), you could actually run the GPU image locally in Docker without issue? You could try explicitly specifying the image_uri parameter of your Estimator with the GPU image and hoping it runs okay:

train_image_uri = sagemaker.image_uris.retrieve(
    framework="huggingface",
    region=your_region,  # e.g. "us-east-1"
    instance_type="ml.p3.2xlarge",  # -> GPU image
    py_version="py38",
    version="4.17",
    base_framework_version="pytorch1.10",
    image_scope="training",
)
estimator = HuggingFace(
    image_uri=train_image_uri,
    instance_type="local",
    ...
)

(For supported combinations can refer to the SageMaker SDK config file).

Alternatively, you could probably just use the PyTorch framework for your local development (or TensorFlow, if you're using HuggingFace TF) - and include a requirements.txt file in your script bundle to install HF libraries at the version(s) you need. For example:

# requirements.txt in the same source_dir folder as your train.py script

transformers[sklearn,sentencepiece]==4.17.0
datasets==1.18.4

This would result in your local test environment being slightly different from the true training job environment, but hopefully close enough to be useful debugging initial functional issues in your code before using SageMaker for the actual training attempts.

dingus
  • 655
  • 1
  • 7
  • 18