I'm pushing data from a RDS MySQL to S3. My S3 target endpoint have this settings:
{
"CsvRowDelimiter": "\\n",
"CsvDelimiter": ",",
"AddColumnName": true,
"CompressionType": "NONE",
"EnableStatistics": true,
"DatePartitionEnabled": true,
"DatePartitionSequence": "YYYYMMDD",
"DatePartitionDelimiter": "SLASH",
"EncryptionMode": "SSE_KMS",
"ServerSideEncryptionKmsKeyId": "XXXXX",
"TimestampColumnName":"TIMESTAMP",
"IncludeOpForFullLoad": true,
"CdcInsertsOnly": false
}
At the end my folder structure is looking like this:
-/
--/my_table_name
---LOAD00000001.csv
---/2023
----/01
-----/09
------20230109-165206632.csv
I have two questions regarding settings without finding the answer:
- Can we setup some prefix on the date folders? I want to follow Hive naming convention (.eg. /year=2023/month=01/day=09/)
- Is this possible to have the initial load csv file not located at the root, but inside the date folders structure?