0

I'm trying to train a model using AWS SageMaker notebooks and am disappointed with how slowly the model is training. I think my bottleneck lies with the IOPS speed to the persistent storage (EFS and EBS) my SageMaker notebooks are accessing for the dataset.

First, I tried training on a SageMaker Studio ml.g4dn.xlarge instance, then moved everything over to a SageMaker notebook ml.g4dn.xlarge instance through Jupyter. Even though g4dn.xlarge instances come with a physically wired 125GB SSD, I'm unable to access it because SageMaker Studio automatically creates an EFS store, and SageMaker notebook instances automatically create an EBS store. How could I store my dataset on the 125GB SSD instead of EFS or EBS to speed up the IOPS?

1 Answers1

0

It is clear that there are instances with memory optimised for large amounts of data. In your case, if the dataset is given as input to the model with exactly that size (so there is no upstream preprocessing to lighten this amount of data), you must know that the g4dn is EBS optimised.

The most obvious answer i can think of is to use an S3 bucket

From "Maximum transfer speed between Amazon EC2 and Amazon S3":

Traffic between Amazon EC2 and Amazon S3 can leverage up to 100 Gbps of bandwidth to VPC endpoints and public IPs in the same region.

Besides being very fast and performant, it is also the best solution in terms of design for all components of your project on AWS. Clearly, it entails different costs and a different architecture, but you will enjoy the maximum speed that the set of AWS services can offer you (and possibly require special configurations for even better performance).

My advice is to follow the AWS guidelines for developing a complex project from scratch: Build, training and deployment of machine learning models.

Giuseppe La Gualano
  • 1,491
  • 1
  • 4
  • 24