4

I'm attempting to extract data from AVRO files produced by Event Hub Capture. In most cases this works flawlessly. But certain files are causing me problems. When I run the following U-SQL job, I get the error:

USE DATABASE Metrics;
USE SCHEMA dbo;

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
REFERENCE ASSEMBLY [Avro];
REFERENCE ASSEMBLY [log4net];

USING Microsoft.Analytics.Samples.Formats.ApacheAvro;
USING Microsoft.Analytics.Samples.Formats.Json;
USING System.Text;

//DECLARE @input string = "adl://mydatalakestore.azuredatalakestore.net/event-hub-capture/v3/{date:yyyy}/{date:MM}/{date:dd}/{date:HH}/{filename}";
DECLARE @input string = "adl://mydatalakestore.azuredatalakestore.net/event-hub-capture/v3/2018/01/16/19/rcpt-metrics-us-es-eh-metrics-v3-us-0-35-36.avro";


@eventHubArchiveRecords =
    EXTRACT Body byte[], 
            date DateTime, 
            filename System.String
    FROM @input
    USING new AvroExtractor(@"
        {
            ""type"":""record"",
            ""name"":""EventData"",
            ""namespace"":""Microsoft.ServiceBus.Messaging"",
            ""fields"":[
                {""name"":""SequenceNumber"",""type"":""long""},
                {""name"":""Offset"",""type"":""string""},
                {""name"":""EnqueuedTimeUtc"",""type"":""string""},
                {""name"":""SystemProperties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes""]}},
                {""name"":""Properties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes""]}},
                {""name"":""Body"",""type"":[""null"",""bytes""]}
            ]
        }
    ");

@json =
    SELECT Encoding.UTF8.GetString(Body) AS json
    FROM @eventHubArchiveRecords;

OUTPUT @json
TO "/outputs/Avro/testjson.csv"
USING Outputters.Csv(outputHeader : true, quoting : true);

I get the following error:

Unhandled exception from user code: "The given key was not present in the dictionary."

An unhandled exception from user code has been reported when invoking the method 'Extract' on the user type 'Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor'

Am I correct in assuming the problem is within the AVRO file produced by Event Hub Capture, or is there something wrong with my code?

Marc Jellinek
  • 538
  • 5
  • 19
  • 1
    Hi Marc, I reached out to the author of the Avro extractor. There were some problems with the Microsoft.Hadoop.Avro library that were addressed by moving to the Apache C# Avro library. I am not familiar enough with the code though to know if that was this issue or a different one nor how to address your question. – Michael Rys Feb 12 '18 at 08:00

3 Answers3

1

The Key Not Present error is referring to the fields in your extract statement. It's not finding the data and filename fields. I removed those fields and your script runs correctly in my ADLA instance.

Jim Lane
  • 41
  • 2
0

The current implementation only supports primitive types, not complex types of the Avro specification at the moment.

0

You have to build and use an extractor based on apache avro and not use the sample extractor provided by MS. We went the same path