-3

We will have flat files loaded into snowflake tables in staging schema from AWS S3. Now we need to perform simple transformations like aggregations, mapping, calculation etcetc. I know we can use informatica, other tools but we really do not have that big transformations to consume third party tools

We have to load as-is flat files from AWS S3 to snowflake so can't use transformations in copy command.

What is the easiest way and best practice to do transformations in snowflake considering simple and basic transformations.

Thanks

2 Answers2

1

If you are loading CSV-files you can also apply some very simple transformations during your COPY-command. According to docs simple transformations are: Column reordering, column omission, and casts using a SELECT statement.

See here: https://docs.snowflake.com/en/user-guide/data-load-transform.html

If you want to do simple aggregations, mappings and/or calculations, I can recommend two ways:

  1. Using Views: https://docs.snowflake.com/en/user-guide/views-introduction.html
  2. Using Stored Procedures: https://docs.snowflake.com/en/sql-reference/stored-procedures.html

The easiest way is developing both (views and stored procedures) within your Snowflake Web GUI.

Marcel
  • 2,454
  • 1
  • 5
  • 13
  • Answers that are mostly links are not helpful as links go dead over time. You need to form your own answer and only use the links as a reference. [answer] – Rob Jan 14 '21 at 13:07
1

Take a look at Snowflake's Tasks and Streams. These would allow you to move incremental data from stage to target tables automatically every time a new set of files are loaded. Might be useful for you.

Tasks: https://docs.snowflake.com/en/user-guide/tasks-intro.html

Streams: https://docs.snowflake.com/en/user-guide/streams.html

(Links are to Snowflake documentation and won't "go dead")

General context is that a stream allows you to see what has changed in a table. Tasks can then monitor a stream and when there are tables can execute a MERGE or a SP to take that incremental data, transform it, and load it to a completed target table. All fully automated. Might be useful in a simple transformation scenario that you have described.

Mike Walton
  • 6,595
  • 2
  • 11
  • 22
  • I have a not-so-trivial use case for transformations and this approach looks like the way to go. Any time a table changes, it will be captured by Streams, which can feed into Tasks running Snowflake Scripting logic. Cool! ^.^ – ankush981 May 22 '22 at 05:00