Azure Data Explorer partitioning policy

Question

The documentation on ADX partitioning policy(https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/partitioningpolicy#the-data-partitioning-process) mentions that you need to set a MaxPartitionCount while using a hash partition key. It also states that this value should be in the range (1,2048] and recommends starting with 128.

Question: If I have a column with a cardinality of 100,000. Shouldn't the max-partition count be 100,000? Shouldn't ADX create a partition for each distinct value in the column? Why is it even required to fill out this property MaxPartitionCount?

Linking to question quite similar.https://stackoverflow.com/questions/63489669/azure-data-explorer-partitioning-strategy/63490151#63490151 — Xavier_prash, Dec 18 '21 at 19:41

Yoni L. · Accepted Answer · 2021-12-18T20:35:27.120

In recommended scenarios (detailed in the doc you've linked to) - The end goal isn't to have a separate partition for each distinct value of the partition key.

Having an extreme number of partitions (100k in your question, or billions in case of a unique device ID) may result with an extreme amount of small data shards, which would be sub-optimal.
Even with "only" 128 as the max partition count, alongside default built-in indexing (regardless of explicit data partitioning) - the ability to narrow the full data set down very significantly at query planning time to a small number of partitions/shards can result with significant reduction in resources utilization and execution time.

For further reading: kusto.blog.

Generally, not following the guidelines and recommendations in the documentation isn't likely to lead you to optimal results.

Thanks @Yoni. That's more clear which means that each of the distinct values will sit in one of those partitions and help reduce the number of scans. — Xavier_prash, Dec 19 '21 at 10:14

Azure Data Explorer partitioning policy

1 Answers1