I want to use AWS S3 as external storage and use Snowflake external tables to query the data.
To not lose/overwrite any data, I have enabled versioning
on the S3 bucket. When multiple versions of a file exists, they will show as duplicates in Snowflake and I could find an option to hide them from external stages/external tables.
LIST @my_stage; -- shows duplicate files
SELECT $1, $2, $3 FROM @my_stage; -- returns duplicate records
CREATE OR REPLACE EXTERNAL TABLE my_external_table (
Name STRING AS (value:c1::STRING),
)
with location = @my_stage;
SELECT * FROM my_external_table; -- shows duplicate records
SELECT DISTINCT row1, row2, row3 from my_external_table; -- manually hide duplicates
Is there any way to only select rows from the latest version of a file without the need to use DISTINCT
?
Thanks in advance