1

Context :

Let's suppose we have multiple datamarts (Ex : HR, Accounting, Marketing ...) and all of them use the Star Schema as dimensional modeling (Kimball approach ) .

Question :

Since Snowflake cloud data warehouse architecture eliminate the need to spin off separate physical data marts / databases in order to maintain performance. So, what's the best approach to build the multiple datamarts on Snowflake ?

Create database for each datamart ? create one database (EDW )with multiple schema and each schema refer to a datamart ?

Thank you !

3 Answers3

2

Ron is correct - the answer depends on a few things:

  1. If there are conformed dimensions, then one database and schema might be the way to go
  2. If they are completely non-integrated data marts I would go with separate schemas or even separate databases. They are all logical containers in Snowflake (rather than physical) with full role based access control available to segregate users.

So really - how do you do it today? Does that work for you or are there things you need or want to do that you cannot do today with your current physical setup. How is security set up with your BI tools? Do they reference a database name or just a schema name? If you can, minimize changes to your data pipeline and reporting so you have fewer things that might need refactoring (at least for your first POC or migration).

One thing to note is that with Snowflake you have the ability to easily do cross database joins (i.e., database.schema.table) - all you need is SELECT access, so even if you separate the marts by database oyu can still do cross mart reporting if needed.

Hope that helps.

  • Hello Kent, Thank you for your answer! Actually, we are buuilding our BI plateforme/data pipeline from scratch and our needs is to simplify the segregation of users / access to data. I was opting for the option 1 one database (DW) which contain all datamarts tables (fact and dim) of all the busines units (HR, marketing, accounting ...) and those datamarts are separated by schema for the logical grouping of each mart objects. Thank you – Amine Belkacemi Jul 31 '18 at 07:45
  • One small reason that you might avoid cross-database joins for your star schemas is that it limits your discovery in a BI tool. When you're selecting tables through the tool UI, you're generally restricted to one database and/or schema. – Ron Dunn Aug 01 '18 at 00:26
  • Afterthought: You could work around the discoverability problem with views that were implemented cross-database. – Ron Dunn Aug 01 '18 at 00:53
1

There is no specific need to separate star schemas at all.

If you're using shared / conformed dimensions across your marts, separation would actually be an anti-pattern.

If your issue is simplifying the segregation of users, schema per mart works well.

All of the approaches you've suggested (DB/mart, DW/schema,...) will work, I'm just not clear on the need.

Ron Dunn
  • 2,971
  • 20
  • 27
  • Hello @Ron Dunn, Thank you for your answer! So, actually, my needs is to simplify the segregation of users / access to data and how can i use the best aproach on Snowflake cloud warehouse to achieve it. So, the best way would be one database (DW) which contain all datamarts tables (fact and dim) of all the busines units (HR, marketing, accounting ...) and those datamarts are separated by schema for the logical grouping of each mart objects Best regards, Amine – Amine Belkacemi Jul 31 '18 at 07:24
1

The goal of having separate data marts is more related to governance, to keep data organized and where it is expected to be found (i.e. sales transactions in the "sales data mart"), and less related to performance issues.

The advantage of having a single database acting as a data warehouse is that all your data for analytics will be stored in one place, making it more accessible and easier to find. In this case, you can use schemas to implement (logically) separate data marts. You can also use schemas within a database to keep development data separate from production data, for each data mart.

Snowflake is different from traditional relational databases; given its technical architecture, it has no issues with joining large tables between different databases/schemas so you can certainly build different data marts in separate databases and join their facts or dimensions with some other Snowflake database/data mart.

In you specific case, if you have a large number of data marts (e.g. 10 or more) and you're not using Snowflake for much more than data warehouseing, I think the best path would be to implement each data mart in its own database and use schemas to manage prod/dev data within each schema. This will help keep data organized, as opposed to quickly reaching a point where you'll have hundreds of tables (every data mart, and its dev/prod versions) in one database, which won't be a great development or maintenance experience.

But, from a performance perspective, there's really no noticeable difference.

jmng
  • 2,479
  • 1
  • 25
  • 38