2

Thanks for reading! I would like to define an external table on a storage account where the path format is as follows:

flowevents/resourceId=/SUBSCRIPTIONS/<unique>/RESOURCEGROUPS/<unique>/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/<unique>/y=2022/m=05/d=11/h=09/m=00/<unique>/datafiles

I would like to partition the external table by date. The relevant documentation for this is located here. My understanding and experimentation indicates that this might not be possible to do, given the URI path above where there are unique values before the values that I would like to partition on and the answer given by Slavik here.

  • Is it possible to create an external table using wildcards to traverse the folders to achieve the partition scheme described above?
  • Is the only way to solve this to define multiple storage connection strings for all possible values of unique? Is there an upper limit to how many values may be provided?
dueland
  • 87
  • 1
  • 6

1 Answers1

0

The path traversal functionality I'm looking for can be found in LightIngest:

-prefix:resourceId=/SUBSCRIPTIONS/00-00C-00-00-00/RESOURCEGROUPS/ -pattern:*/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/*/y=2021/m=11/d=10/*.json

It does not seem to be supported when defining external tables. A possible reason for this is that the engine will get overloaded if you load too many files from external storage. I got the following error message when I defined 50 connections strings:

Partial query failure: Input stream/record/field too large (E_INPUT_STREAM_TOO_LARGE). (message: '', details: '')

It works as intended when I provided 30 connection strings and used four virtual columns to do partitioning. This error message is not described in the documentation, by the way.

Update, for kusto developers: I attempted to use virtual columns for the whole URI path and then query to generate the connection string. I verified that the table definition is correct using:

.show external table X articats limit 1

It would show the partitions with populated values. However, when attempting to query the external table using the recommended operators ("in" or "has") to navigate it does not work, the query goes on forever despite fetching a small file and running on a cluster on D14_v2 VMs. If I were to define an external table just for that file, it would load just fine.

dueland
  • 87
  • 1
  • 6