Overview
I have a user registration / onboarding flow that I am currently trying to optimise and better understand before scaling out to much larger load tests.
Test Collection: (500 RU)
PartitionKey: tenant_email
Multi-Master: 5 Regions
Below is the single region statistics on a database with only one region.
- Step 1 - Register new user (10.17 RU)
- Step 2 - Update some data (3.4 RU)
- Step 3 - Create a subscription (13.23 RU)
- Step 4 - Update some data (3.43 RU)
- Step 4 - Update some data (3.43 RU)
- Step 5 - Update some data (3.83 RU)
- Step 6 - Refresh access token (3.13RU)
- Total: ~40.5 RU per onboard
Problem
Expected throughput: ~12 registrations (84req/sec)
Actual throughput: Heavy rate limiting at ~3 registrations per second (21req/sec). At ~40RU this seems like I'm only getting 120RU utilisation of the 500?
The storage distribution below, and the partitionKey should be unique enough to evenly distribute load over the collection to maximise throughput? not sure why the Max Consumed RU/s is so high.
Storage distribution for the collection and chosen partitionKey looks to be evenly distributed.
Update - Under utilisation
Here is a screenshot showing a collection with a single 500 RU partition. You can clearly see from this that the max consumed RU per partition sat around ~350 the whole time yet notice the heavy rate limiting even though we never hit 500 RU/s.