I am using data wrangler to upload data from a dataframe into S3 bucket parquet files, and am trying to get it in a 'Hive'-like folder structure of:
prefix
- year=2022
-- month=08
--- day=01
--- day=02
--- day=03
In the following code example:
import awswrangler as wr
import pandas as pd
wr.s3.to_parquet(
df=pd.DataFrame({
'date': ['2022-08-01', '2022-08-02', '2022-08-03'],
'col2': ['A', 'A', 'B']
}),
path='s3://bucket/prefix',
dataset=True,
partition_cols=['date'],
database='default'
)
The resulting s3 folder structure would be:
prefix
- date=2022-08-01
- date=2022-08-02
- date=2022-08-03
The Sagemaker feature store ingest function (https://sagemaker.readthedocs.io/en/stable/api/prep_data/feature_store.html) sort of does this automatically with the event_time_feature_name
column (timestamp) automatically creating the Hive file structure in S3.
How can I do this with Data Wrangler without creating 3 additional columns from the 1 column and declaring them as partitions, but put in 1 column and have the partitions by year month and day automatically created?