0

Overview

I have a user registration / onboarding flow that I am currently trying to optimise and better understand before scaling out to much larger load tests.

Test Collection: (500 RU)
PartitionKey: tenant_email
Multi-Master: 5 Regions

Below is the single region statistics on a database with only one region.

  • Step 1 - Register new user (10.17 RU)
  • Step 2 - Update some data (3.4 RU)
  • Step 3 - Create a subscription (13.23 RU)
  • Step 4 - Update some data (3.43 RU)
  • Step 4 - Update some data (3.43 RU)
  • Step 5 - Update some data (3.83 RU)
  • Step 6 - Refresh access token (3.13RU)
  • Total: ~40.5 RU per onboard

Problem

Expected throughput: ~12 registrations (84req/sec)
Actual throughput: Heavy rate limiting at ~3 registrations per second (21req/sec). At ~40RU this seems like I'm only getting 120RU utilisation of the 500?

enter image description here

enter image description here

The storage distribution below, and the partitionKey should be unique enough to evenly distribute load over the collection to maximise throughput? not sure why the Max Consumed RU/s is so high.

enter image description here

Storage distribution for the collection and chosen partitionKey looks to be evenly distributed.

enter image description here

Update - Under utilisation

Here is a screenshot showing a collection with a single 500 RU partition. You can clearly see from this that the max consumed RU per partition sat around ~350 the whole time yet notice the heavy rate limiting even though we never hit 500 RU/s.

enter image description here

Joshua Hayes
  • 1,938
  • 2
  • 21
  • 39

1 Answers1

0

Your rate-limiting is likely because you don't have access to all 500 RU in a single physical partition.

Take a close look at your 2nd graph, which has a hint to what's likely going on:

Collection UsersTest has 5 partition key ranges. Provisioned throughput is evenly distributed across these partitions (100 RU/s per partition).

Under the covers, Cosmos DB creates a set of physical partitions, and your RU are divided across those physical partitions. In your case, Cosmos DB created 5 physical partitions.

Logical partitions may be mapped to any of your 5 physical partitions. So it's possible that, during your test, more than one logical partition mapped to the same physical partition. And given that each physical partition would top out at roughly 2-3 registrations per second, this likely explains why you're seeing throttling.

David Makogon
  • 69,407
  • 21
  • 141
  • 189
  • This is what I thought too but the partitionKey is unique on every user so cosmos should be doing a good job at evenly distributing them across logical partitions. Even with 100 RU per partition, I would have expected two users per partition resulting in a min. throughput of 8 registrations (56 req / s). – Joshua Hayes Dec 26 '18 at 00:07
  • See update to original question. I ran a test with a steady / consistent load for 30 mins on a collection with a single 500 RU partition to remove / isolate this problem and still encounted rate limiting despite the fact that I was only consuming 350 RU/s?? Under what situations can you get rate limiting when you havven't even approached the limit of your physical partition? – Joshua Hayes Dec 28 '18 at 15:57