2

I`m using AWS DMS to migrate data from a MySQL RDS to S3 with parquet format. After that I am running a GLUE crawler, creating a database and a table. Everything is working fine, but the timestamps.

As example , on AWS Athena query I'm getting the result: +51218-09-22 03:06:40.000 while the actual value is 2019-04-01-14.27.04 -0300. As you can see, the date is completely wrong. All other types were migrated fine

Has anyone had this problem?

rocampana
  • 73
  • 2
  • 7
  • 1
    I think, you have a similar problem as described in [this post](https://stackoverflow.com/questions/58853686/string-int96-to-datatime-amazon-athena-sql-ddl-dml) earlier today. What is precision of your timestamps? Is it up to microseconds? Also see comments in the aforementioned post – Ilya Kisil Nov 14 '19 at 18:51
  • thank you @IlyaKisil, the problem was this – rocampana Nov 14 '19 at 21:41

2 Answers2

1

Solution found here: https://forums.aws.amazon.com/thread.jspa?threadID=302156

Apparently, there is a mismatch between default timestamp that DMS migrate the data and how Athena map this timestamp (miliseconds/ nanoseconds).

"add parquetTimestampInMillisecond=true to the extra connection attributes of the target endpoint fixed this" then run DMS and Crawler again

rocampana
  • 73
  • 2
  • 7
0

TL;DR: Divide by 1000 and CAST to convert from Parquet v2 to Parquet v1 format

SELECT id,
   CAST("from_unixtime"(CAST(("to_unixtime"("created_at")  / 1000) AS bigint)) AS timestamp) "created_at",
FROM <YOUR_ATHENA_TABLE>

The date/timestamp issue comes from an incompatibility between what Athena expects and what is provided in the S3 parquet files (created by the RDS snapshot export or DMS).

Athena expects Parquet V1 files whereas RDS exports in Parquet V2 format. Parquet V1 timestamp are INT96 whereas V2 is INT64. Hence the issue. Took me a while to figure out but credits goes to him.

Thomas
  • 3,119
  • 2
  • 16
  • 22