We want to ingest our source tables from an s3 external stage into Snowflake. For this ingestion we have to consider, new files arriving in the s3 bucket, updates in existing files, and in some cases row deletions.
We are evaluating 3 approaches so far:
- full drop & copy of tables daily (straightforward but less performant)
- copy command which will capture new and updated files, and then execute a merge query to deduplicate and even delete rows based on specific use cases (could work out, but we need to maintain more complex merge & delete logic per case)
- use external tables on top of the external s3 stage, and materialized views on top of the external tables, to boost the query performance.(not sure if this is a suggested ingestion mechanism)
From all the 3 approaches the external tables & materialized view seems to be the one that keeps all tables up to date with the source in a more care-free / less-maintenance way. But at the same time, we haven't seen this suggested as an ingestion mechanism for Snowflake. Also it is not very clear to us how snowflake handles the maintenance of materialized views, to understand cost implications (could it be compared cost-wise to the copy and merge queries of the 2nd approach for example?).
In general, are there any best practices regarding our ingestion use case? Also some feedback regarding the 3rd approach would be very useful.