I'm working on a recommendation model with 2M users and 2100 items. using this library - https://github.com/benfred/implicit .
I just noticed when we are training the model on 'ml.t3.medium' (vCPU - 2 and Memory - 4 GiB) took only 2 mins on Sagemaker notebook while when running in sagemaker pipeline on larger machine 'ml.t3.2xlarge' (vCPU - 8 and Memory - 32 GiB) taking 20 mins. Also noticed similar runtime when running on containers.
Our idea is to get recommendations refreshed every 2 mins for the active users.