1

When you setup SageMaker you specify the VPC that it runs in, and any corresponding subnets. If no subnets are specified it uses 2 by default.

But during the course of architecture creation it's easy to have different resources use the same subnets, causing errors such as this:

Failed to change to instance xx.8xlarge Failed to launch app [xxxx]: LimitExceededError: Unable to create network interface because subnet 'subnet-xxxx' does not have enough free addresses to satisfy the request. Free up addresses or add more addresses for the subnet to use, or create a new domain with a new subnet.

It would be nice to change the subnets that SageMaker uses without having to tear down the entire setup and start over. But the only documentation I can see on configuring the VPC/subnets for SageMakers is in the setup stage.

So, what is SageMaker’s relationship to subnets, where is this configured, and can this be modified after deployment?

Cybernetic
  • 12,628
  • 16
  • 93
  • 132

1 Answers1

3

For both SageMaker notebook instances and Studio, you can specify a VPC in which you would want the instances to run. When you run a notebook inside a VPC, you need interface endpoints to SM API, runtime API and additional resources you might use (s3, ECR, cloudwatch etc.)

These are associated to your VPC and the subnets you specify. For that reason, currently it is not possible to update the vpc/subnet configuration dynamically and has to be done at setup. The only way to update is to tear down your current notebooks (or domain) and create a new notebook (or domain) with different configuration.

There's more documentation here and here. Note that your training jobs, processing jobs etc. will also need IP addresses to run in the VPC.

durga_sury
  • 869
  • 4
  • 6
  • Thanks for the answer. That makes sense. The challenge is this assumes the user will know the resource demands prior to experimenting with the models. Is there a way to estimate the number of subnets that will be required for a given training job in sageMaker, rather than constantly tearing down environments because we guessed wrong? – Cybernetic Mar 07 '22 at 21:28
  • Agreed! It depends on the training jobs/processing jobs you would start, number of users etc. - each job running in a VPC needs 2 IPs, and so with hosted endpoints. Your subnet will need that many available IPs. Also, these IP addresses are released after your job completes - so you'll only need as many as the number of parallel jobs you would execute. – durga_sury Apr 20 '22 at 16:26