I have a table in Azure Data Explorer that collects data from IoT sensors. In the near future it will collect millions of records each day. So to get the best query performance I am looking into setting a partitioning policy: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/partitioningpolicy
My table has 5 important columns: TenantId, DeviceId, SensorId, Value, Timestamp
The combination of (TenantId, DeviceId, VariableId) makes a sensor globally unique, and almost all queries will contain a part that says TenantId = 'xx' and DeviceId = 'xx' and VariableId = 'xx'. All these columns are of type string, and have a high-cardinality (10.000+ Tenants, 1000+ DeviceIds, 10.000+ VariableIds)
Two questions:
Would it be wise to apply partitioning on this table based on one or more of the string columns? It complies with the advice in the documentation that says:
- The majority of queries use equality filters (==, in()).
- The majority of queries aggregate/join on a specific string-typed column of large-dimension (cardinality of 10M or higher) such as an application_ID, a tenant_ID, or a user_ID.
But later on the page, for the MaxPartitionCount they say that it should be not higher than 1024 and lower than the cardinality of the column. As I have high-cardinality columns this does not comply, so I am a bit confused.
- Would it be best to concat the string columns before ingestion and partition on the new column? Or only on TenantId for example?