0

Below are two methods that are part of my state machine in AWS.

First, the method that uses S3 SELECT to obtain data from a csv file.

/// <summary>
/// Use S3 Select in order to obtain the data from the source and return it
/// </summary>
/// <param name="s3Object"></param>
/// <param name="s3Client"></param>
/// <param name="definition"></param>
/// <returns></returns>
private static async Task<ISelectObjectContentEventStream> GetSelectObjectContentEventStream(S3Object s3Object,
    AmazonS3Client s3Client, ObjectDefinition definition)
{
    var response = await s3Client.SelectObjectContentAsync(new SelectObjectContentRequest()
    {
        Bucket = s3Object.BucketName,
        Key = s3Object.Key,
        ExpressionType = ExpressionType.SQL,
        Expression = "select * from S3Object",
        InputSerialization = new InputSerialization()
        {
            CSV = new CSVInput()
            {
                FileHeaderInfo = FileHeaderInfo.Ignore,
                FieldDelimiter = ",",
            }
        },
        OutputSerialization = new OutputSerialization()
        {
            JSON = new JSONOutput()
        }
    });

    return response.Payload;
}

And now, the method that calls it:

public async Task<StaticDataConsumerDefinition> ConvertFromSourceS3Async(StaticDataConsumerDefinition staticDataConsumer, ILambdaContext context)
{
    using (var s3Client = new AmazonS3Client())
    {
        foreach (ObjectDefinition definition in staticDataConsumer.TargetList.Objects)
        {
            var listRequest = new ListObjectsV2Request
            {
                BucketName = definition.FilePath,
                MaxKeys = 1000
            };

            ListObjectsV2Response listResponse;
            listResponse = s3Client.ListObjectsV2Async(listRequest).Result; // Force synchronous

            if (definition.LogActivity)
            {
                context.Logger.LogLine($"Response from S3 Request: {listResponse.HttpStatusCode} ({listResponse.HttpStatusCode.ToString()})");
            }

            foreach (var entity in listResponse.S3Objects.Where(n => n.Key.Contains(definition.FilePrefix)))
            {
                if (entity.Key.Contains(definition.FileExtension))
                {
                    context.Logger.LogLine($"entity {entity.Key}");

                    using (var s3Events = await GetSelectObjectContentEventStream(entity, s3Client, definition))
                    {
                        foreach (var ev in s3Events)
                        {
                            context.Logger.LogLine($"Received {ev.GetType().Name}!");
                            if (ev is RecordsEvent records)
                            {
                                context.Logger.LogLine("The contents of the Records Event is...");
                                using (var reader = new StreamReader(records.Payload))
                                {
                                    context.Logger.Log(reader.ReadToEnd());
                                }
                            }
                        }
                    }
                }
            }
        }
    }
        context.Logger.Log($"Passing ConvertSourceData {ConvertToIndentedJson(staticDataConsumer)}");
    return staticDataConsumer;
}

However, the data I get out into the CloudWatch log is garbage - it looks a bit like ASCII characters / or encoded characters? Not what I was expecting! Any ideas folks?

{ "_1": "\u00001\u00000\u00005\u0000", "_2": "\u0000K\u0000a\u0000t\u0000e\u0000 \u0000F\u0000a\u0000r\u0000d\u0000e\u0000l\u0000l\u0000\r\u0000" }

{ "_1": "\u00001\u00000\u00006\u0000", "_2": "\u0000S\u0000h\u0000o\u0000n\u0000a\u0000 \u0000M\u0000a\u0000r\u0000i\u0000n\u0000o\u0000\r\u0000" }

{ "_1": "\u00001\u00000\u00008\u0000", "_2": "\u0000S\u0000h\u0000o\u0000n\u0000a\u0000 \u0000M\u0000a\u0000r\u0000i\u0000n\u0000o\u0000\r\u0000" }

{ "_1": "\u00001\u00001\u00001\u0000", "_2": "\u0000S\u0000h\u0000o\u0000n\u0000a\u0000 \u0000M\u0000a\u0000r\u0000i\u0000n\u0000o\u0000\r\u0000" }

{ "_1": "\u00001\u00001\u00002\u0000", "_2": "\u0000L\u0000i\u0000n\u0000a\u0000 \u0000H\u0000a\u0000n\u0000n\u0000a\u0000w\u0000e\u0000\r\u0000" }

{ "_1": "\u00001\u00001\u00003\u0000", "_2": "\u0000J\u0000e\u0000n\u0000n\u0000i\u0000f\u0000e\u0000r\u0000 \u0000H\u0000a\u0000l\u0000e\u0000\r\u0000" }

{ "_1": "\u00001\u00001\u00004\u0000", "_2": "\u0000S\u0000t\u0000a\u0000n\u0000 \u0000K\u0000a\u0000k\u0000k\u0000a\u0000s\u0000i\u0000s\u0000\r\u0000" }

{ "_1": "\u00001\u00001\u00006\u0000", "_2": "\u0000S\u0000t\u0000a\u0000n\u0000 \u0000K\u0000a\u0000k\u0000k\u0000a\u0000s\u0000i\u0000s\u0000\r\u0000" }

{ "_1": "\u00001\u00001\u00008\u0000", "_2": "\u0000S\u0000t\u0000a\u0000n\u0000 \u0000K\u0000a\u0000k\u0000k\u0000a\u0000s\u0000i\u0000s\u0000\r\u0000" }

{ "_1": "\u00001\u00001\u00009\u0000", "_2": "\u0000S\u0000h\u0000o\u0000n\u0000a\u0000 \u0000M\u0000a\u0000r\u0000i\u0000n\u0000o\u0000\r\u0000" }

{ "_1": "\u00001\u00002\u00007\u0000", "_2": "\u0000A\u0000y\u0000d\u0000i\u0000n\u0000 \u0000T\u0000e\u0000b\u0000y\u0000a\u0000n\u0000i\u0000a\u0000n\u0000\r\u0000" }

{ "_1": "\u00001\u00002\u00008\u0000", "_2": "\u0000C\u0000a\u0000m\u0000e\u0000r\u0000o\u0000n\u0000 \u0000P\u0000a\u0000l\u0000m\u0000e\u0000r\u0000\r\u0000" }

{ "_1": "\u00001\u00009\u00007\u0000", "_2": "\u0000S\u0000h\u0000a\u0000r\u0000o\u0000n\u0000 \u0000B\u0000e\u0000r\u0000g\u0000e\u0000r\u0000\r\u0000" }

{ "_1": "\u00002\u00000\u00001\u0000", "_2": "\u0000K\u0000a\u0000r\u0000e\u0000e\u0000n\u0000a\u0000 \u0000D\u0000a\u0000v\u0000i\u0000e\u0000s\u0000\r\u0000" }

{ "_1": "\u00002\u00000\u00002\u0000", "_2": "\u0000L\u0000i\u0000n\u0000a\u0000 \u0000H\u0000a\u0000n\u0000n\u0000a\u0000w\u0000e\u0000\r\u0000" }

{ "_1": "\u00002\u00000\u00003\u0000", "_2": "\u0000S\u0000a\u0000m\u0000 \u0000V\u0000i\u0000t\u0000a\u0000n\u0000z\u0000a\u0000\r\u0000" }

{ "_1": "\u00002\u00000\u00006\u0000", "_2": "\u0000K\u0000a\u0000r\u0000e\u0000e\u0000n\u0000a\u0000 \u0000D\u0000a\u0000v\u0000i\u0000e\u0000s\u0000\r\u0000" }

{ "_1": "\u00002\u00000\u00008\u0000", "_2": "\u0000K\u0000a\u0000r\u0000e\u0000e\u0000n\u0000a\u0000 \u0000D\u0000a\u0000v\u0000i\u0000e\u0000s\u0000\r\u0000" }

{ "_1": "\u00002\u00001\u00004\u0000", "_2": "\u0000K\u0000a\u0000r\u0000e\u0000e\u0000n\u0000a\u0000 \u0000D\u0000a\u0000v\u0000i\u0000e\u0000s\u0000\r\u0000" }

{ "_1": "\u00002\u00001\u00007\u0000", "_2": "\u0000K\u0000y\u0000l\u0000i\u0000e\u0000 \u0000B\u0000r\u0000a\u0000d\u0000l\u0000e\u0000y\u0000\r\u0000" }

{ "_1": "\u00002\u00001\u00008\u0000", "_2": "\u0000K\u0000a\u0000t\u0000e\u0000 \u0000F\u0000a\u0000r\u0000d\u0000e\u0000l\u0000l\u0000\r\u0000" }

{ "_1": "\u00002\u00001\u00009\u0000", "_2": "\u0000C\u0000a\u0000m\u0000e\u0000r\u0000o\u0000n\u0000 \u0000P\u0000a\u0000l\u0000m\u0000e\u0000r\u0000\r\u0000" }

{ "_1": "\u00002\u00002\u00003\u0000", "_2": "\u0000S\u0000a\u0000m\u0000 \u0000V\u0000i\u0000t\u0000a\u0000n\u0000z\u0000a\u0000\r\u0000" }

{ "_1": "\u00002\u00002\u00005\u0000", "_2": "\u0000K\u0000a\u0000r\u0000e\u0000e\u0000n\u0000a\u0000 \u0000D\u0000a\u0000v\u0000i\u0000e\u0000s\u0000\r\u0000" }

{ "_1": "\u00002\u00002\u00006\u0000", "_2": "\u0000K\u0000a\u0000t\u0000e\u0000 \u0000F\u0000a\u0000r\u0000d\u0000e\u0000l\u0000l\u0000\r\u0000" }

{ "_1": "\u00002\u00002\u00008\u0000", "_2": "\u0000K\u0000a\u0000r\u0000e\u0000e\u0000n\u0000a\u0000 \u0000D\u0000a\u0000v\u0000i\u0000e\u0000s\u0000\r\u0000" }

{ "_1": "\u00002\u00002\u00009\u0000", "_2": "\u0000K\u0000a\u0000r\u0000e\u0000e\u0000n\u0000a\u0000 \u0000D\u0000a\u0000v\u0000i\u0000e\u0000s\u0000\r\u0000" }

{ "_1": "\u00002\u00003\u00000\u0000", "_2": "\u0000K\u0000a\u0000r\u0000e\u0000e\u0000n\u0000a\u0000 \u0000D\u0000a\u0000v\u0000i\u0000e\u0000s\u0000\r\u0000" }

This is the actual CSV data

Retail Store,Store Retail Business Manager
105,Kate Fardell
106,Shona Marino
108,Shona Marino
111,Shona Marino
112,Lina Hannawe
113,Jennifer Hale
114,Stan Kakkasis
116,Stan Kakkasis
118,Stan Kakkasis
119,Shona Marino
127,Aydin Tebyanian
128,Cameron Palmer
197,Sharon Berger
201,Kareena Davies
202,Lina Hannawe
203,Sam Vitanza
206,Kareena Davies
208,Kareena Davies
214,Kareena Davies
217,Kylie Bradley
218,Kate Fardell
219,Cameron Palmer
223,Sam Vitanza
225,Kareena Davies
226,Kate Fardell
228,Kareena Davies
229,Kareena Davies
230,Kareena Davies
JamesMatson
  • 2,522
  • 2
  • 37
  • 86

1 Answers1

1

I'd assume it's perhaps encoding?

Make this line:

using (var reader = new StreamReader(records.Payload))

Like this:

using (var reader = new StreamReader(records.Payload, System.Text.Encoding.UTF8))
LaraRaraBoBara
  • 628
  • 1
  • 6
  • 15
  • 1
    Thanks for the suggestion, but I've tried both ways, and there is no change. I have been able to return the results (output) as CSV no problems (well, some problems, but different ones) but can't get the JSON output to work at all under any circumstances. I have posted a different SO question around my other issue. My other SO question posted here -> https://stackoverflow.com/questions/53093990/strange-results-from-using-aws-s3-select-to-get-csv-data-into-sql-table – JamesMatson Nov 02 '18 at 00:05
  • 1
    I'm having issues reading JSON with C# as well, diff problems though and yeah - there's WAY too little help online so far. Sorry this wasn't your problem. – LaraRaraBoBara Nov 02 '18 at 00:30
  • That's okay I'm just glad there's some of us out there struggling through this. Out of curiosity when you say issues reading JSON with C# - are you talking in the context of AWS S3 SELECT specifically? Or in general? If the latter I can probably assist. Here's hoping that the community builds out more information on S3 SELECT over time. I'm going to ditch using it for this, and instead move to something like GenericParser to work with the CSV data. – JamesMatson Nov 02 '18 at 00:33
  • 1
    I am using AWS S3 SELECT and can return the entire document, but if I change my SELECT * FROM S3Object at all, I either get an error or {} as the result. So either the JSON I'm querying isn't typical (one document is one json "object"), or it's my lack of understanding of the S3 SELECT commands to find the attribute value I'm looking for. I hack C#, my proficiency is SQL Server/ETL so I'm 99% sure it's me. – LaraRaraBoBara Nov 02 '18 at 00:38