0

I'm working on a recommendation model with 2M users and 2100 items. using this library - https://github.com/benfred/implicit .

I just noticed when we are training the model on 'ml.t3.medium' (vCPU - 2 and Memory - 4 GiB) took only 2 mins on Sagemaker notebook while when running in sagemaker pipeline on larger machine 'ml.t3.2xlarge' (vCPU - 8 and Memory - 32 GiB) taking 20 mins. Also noticed similar runtime when running on containers.

Our idea is to get recommendations refreshed every 2 mins for the active users.

Selva
  • 976
  • 1
  • 10
  • 23
  • Can you check where the time in spent when running the pipeline. There are additional steps involved when you run outside of notebooks like data copy, container download and saving the model files and recommendations to s3 etc. We need to get to the bottleneck in order to address it. – Arun Lokanatha Mar 14 '23 at 01:03

0 Answers0