1

Here's my issue with unsigned integers in our source database (MySQL RDS):

I use AWS DMS to do an initial load of the source table and the target is S3 (our Zone1 of our data lake) saved as parquet. I can then crawl it with Glue and query the table in Athena. All good here

I then created a Glue job to read that Zone1 data catalog and output to another bucket in S3 (our Zone2). However, the Glue job fails with: Parquet type not supported: INT64 (UINT_64)

Does anyone have a workaround that I can put into the Glue job to 'cast' this data type to something else?

Asclepius
  • 57,944
  • 17
  • 167
  • 143
  • 1
    Hey there Acumenus and thanks for responding. Yeah that was the answer that I figured out a few months ago and forgot to update my post here. In DMS, I used the "transformation" feature and converted it to INT (or maybe it was INT8). Anyway, it now lands in Zone2 (S3) as parquet that can then actually be processed by Glue. Works like a charm – Brad Sheridan Apr 25 '21 at 01:00
  • I wrote parquet files containing UInt8, UInt16, and UInt32 (but not yet UInt64) columns. I read these parquet files successfully using AWS Athena version 2. They were however read as signed integer columns. All [AWS Athena integer datatypes](https://docs.aws.amazon.com/athena/latest/ug/data-types.html) look to be signed, with no unsigned versions. It's possible that the issue you were experiencing with UInt columns was limited to AWS Athena v1, and that perhaps it is better handled in v2. – Asclepius Apr 27 '21 at 21:34
  • very good point @Acumenus. I didn't even think about the newer Athena engine and my pipeline was written last year before v2 was released. Side note: some of our data eventually lands in Redshift/Spectrum and Redshift doesn't like the Tinyint data type. However, instead of using 'transformations' in the DMS job when landing data in zone1 (like I did for the UINT columns), I cast them to smallint in the Glue job from zone1 to zone2 – Brad Sheridan Apr 30 '21 at 11:32

0 Answers0