2

for several days I'm trying to ingest Apache Avro formatted data from a blob storage into the Azure Data Explorer.

I'm able to reference the toplevel JSON-keys like $.Body (see red underlined example in the screenshot below), but when it goes to the nested JSON-keys, Azure fails to parse them properly and displays nothing (as seen in the green column: I would expect $.Body.entityId to reference the key "entityId" inside the Body-JSON).

Many thanks in advance for any help!

Here is a screenshot of the azure data explorer web interface

Edit 1

I already tried to increase the "Nested Levels" Option to 2, but all I got is this error message with no further details. The error message won't even disappear when I decrease the Level back to 1. I have to cancel and start the process all over agein.

I just recognize that the auto-generated columns have some strange types. Seems like they add up to the type string... This seems a little odd to me either.

Edit 2

Here is some kql-Code.

This is the schema of my input .avro file, what I get from my Eventhub-Capture:

{
  SequenceNumber: ...,
  Offset: ...,
  EnqueuedTimeUTC: ...,
  SystemProperties: ...,
  Properties: ...,
  Body: {
    entityId: ...,
    eventTime: ...,
    messageId: ...,
    data: ...
  }
}, ...

And with these ingestion commands I can't reference the inner JSON-keys. The toplevel keys work perfectly fine.

// Create table command
////////////////////////////////////////////////////////////
.create table ['test_table']  (['Body']:dynamic, ['entityId']:string)

// Create mapping command
////////////////////////////////////////////////////////////
.create table ['test_table'] ingestion apacheavro mapping 'test_table_mapping' '[{"column":"Body", "Properties":{"Path":"$.Body"}},{"column":"entityId", "Properties":{"Path":"$.Body.entityId"}}]'

// Ingest data into table command
///////////////////////////////////////////////////////////
.ingest async into table ['test_table'] (h'[SAS URL]') with (format='apacheavro',ingestionMappingReference='test_table_mapping',ingestionMappingType='apacheavro',tags="['503a2cfb-5b81-4c07-8658-639009870862']")

I would love to ingest the inner data fields on separate columns, instead of building any workaround with update policies.

allrik
  • 43
  • 8

3 Answers3

2

For those having the same issue, here is the workaround we currently use:

First, assume that we want to ingest the contents of the Body field from the avro file to the table avro_destination.

Step 1: Create an ingestion table

.create table avro_ingest(
    Body: dynamic
    // optional other columns, if you want...
)

Step 2: Create an update policy

.create-or-alter function
    with (docstring = 'Convert avro_ingest to avro_destination', folder='ingest')
    convert_avro_ingest() {
        avro_ingest
        | extend entityId = tostring(Body.entityId)
        | extend messageId = tostring(Body.messageId)
        | extend eventTime = todatetime(Body.eventTime)
        | extend data = Body.data
        | project entityId, messageId, eventTime, data
    }
.alter table avro_destination policy update
@'[{ "IsEnabled": true, "Source": "avro_ingest", "Query": "convert_avro_ingest()", "IsTransactional": false, "PropagateIngestionProperties": true}]'

Step 3: Ingest the .avro files into the avro_ingest table

...as seen in the Question, with one Column containing the whole Body-JSON per entry.

allrik
  • 43
  • 8
  • 1
    Hi, this answer actually worked out for me. If I select the "APACHE AVRO" format in the ingestion-tab of the portal (note that this is not the same as "AVRO"), it displays the contents of the Body-field as JSON (as seen in my very first screenshot, the red column). If this is misleading though, I can delete this answer. Thanks for your comment! – allrik Jul 04 '22 at 07:18
  • Yep, you are right :-) Please keep it – David דודו Markovitz Jul 04 '22 at 08:06
1

Simply increase "Nested levels" to 2.

Data ingestion

David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88
1

Following the OP updates

Here is the Avro schema of an Event Hubs capture.
As you can see, Body as of type bytes, so there is practically nothing you can do with it at this form, other than ingesting it As Is (as Dynamic).

{

    "type":"record",
    "name":"EventData",
    "namespace":"Microsoft.ServiceBus.Messaging",
    "fields":[
                 {"name":"SequenceNumber","type":"long"},
                 {"name":"Offset","type":"string"},
                 {"name":"EnqueuedTimeUtc","type":"string"},
                 {"name":"SystemProperties","type":{"type":"map","values":["long","double","string","bytes"]}},
                 {"name":"Properties","type":{"type":"map","values":["long","double","string","bytes"]}},
                 {"name":"Body","type":["null","bytes"]}
             ]
}

If you'll take a look on the ingested data, you'll see that the content of Body is arrays of integers.
Those integers are the decimal values of the characters that construct Body.

capture
| project Body
| take 3
Body
[123,34,105,100,34,58,32,34,56,49,55,98,50,99,100,57,45,97,98,48,49,45,52,100,51,53,45,57,48,51,54,45,100,57,55,50,51,55,55,98,54,56,50,57,34,44,32,34,100,116,34,58,32,34,50,48,50,49,45,48,56,45,49,50,84,49,54,58,52,56,58,51,50,46,53,57,54,50,53,52,34,44,32,34,105,34,58,32,48,44,32,34,109,121,105,110,116,34,58,32,50,48,44,32,34,109,121,102,108,111,97,116,34,58,32,48,46,51,57,56,53,52,52,56,55,52,53,57,56,57,48,55,57,55,125]
[123,34,105,100,34,58,32,34,57,53,100,52,100,55,56,48,45,97,99,100,55,45,52,52,57,50,45,98,97,54,100,45,52,56,49,54,97,51,56,100,52,56,56,51,34,44,32,34,100,116,34,58,32,34,50,48,50,49,45,48,56,45,49,50,84,49,54,58,52,56,58,51,50,46,53,57,54,50,53,52,34,44,32,34,105,34,58,32,49,44,32,34,109,121,105,110,116,34,58,32,56,56,44,32,34,109,121,102,108,111,97,116,34,58,32,48,46,54,53,53,51,55,51,51,56,49,57,54,53,50,52,52,49,125]
[123,34,105,100,34,58,32,34,53,50,100,49,102,54,54,53,45,102,57,102,54,45,52,49,50,49,45,97,50,57,99,45,55,55,56,48,102,101,57,53,53,55,48,56,34,44,32,34,100,116,34,58,32,34,50,48,50,49,45,48,56,45,49,50,84,49,54,58,52,56,58,51,50,46,53,57,54,50,53,52,34,44,32,34,105,34,58,32,50,44,32,34,109,121,105,110,116,34,58,32,49,57,44,32,34,109,121,102,108,111,97,116,34,58,32,48,46,52,53,57,54,49,56,54,51,49,51,49,50,50,52,50,50,51,125]

Body can be converted to text using make_string() and then parsed to JSON using todynamic()

capture
| project BodyJSON = todynamic(make_string(Body))
| take 3
BodyJSON
{"id":"817b2cd9-ab01-4d35-9036-d972377b6829","dt":"2021-08-12T16:48:32.5962540Z","i":0,"myint":20,"myfloat":"0.398544874598908"}
{"id":"95d4d780-acd7-4492-ba6d-4816a38d4883","dt":"2021-08-12T16:48:32.5962540Z","i":1,"myint":88,"myfloat":"0.65537338196524408"}
{"id":"52d1f665-f9f6-4121-a29c-7780fe955708","dt":"2021-08-12T16:48:32.5962540Z","i":2,"myint":19,"myfloat":"0.45961863131224223"}
David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88
  • 1
    After some research I've done in the past week, I came to the same result, that the byte-array is the root cause of the problem. Thank you for your answers, that helped me a lot to analyse this issue... – allrik Jul 04 '22 at 06:58