2

I have a dbt project which gets the data from MySQL loads in RAW db in snowflake, transforms and loads in ANALYTICS db in snowflake. Now I have a requirement to create a table in a third database PROD in snowflake which aggregates data from ANALYTICS and loads in PROD.

.

If I change the DBT_DATABASE variable to PROD, I get an error since existing dbt models fail. How do I create another variable which points target database to PROD

1 Answers1

3

There are several ways to configure the target database for dbt-snowflake, since Snowflake databases are just logical concepts and share a single connection.

Let's say your data in RAW is configured as a source. You can add a .yml file with the source info, specifying the database:

version: 2

sources:
  - name: mysql_replica
    database: RAW

    tables:
      - name: mysql_table

Now we can add a model that references that source and lands the transformed data into the ANALYTICS database. We can do this by setting the target's default database to ANALYTICS in our profiles.yml:

my-snowflake-db:
  target: dev
  outputs:
    dev:
      type: snowflake
      account: [account id]

      # User/password auth
      user: [username]
      password: [password]

      role: [user role]
      database: ANALYTICS # or "{{ env_var('DBT_DATABASE') }}"
      warehouse: [warehouse name]
      schema: [dbt schema]
      threads: [1 or more]

Now we can write our model, without config, to use the database we specified in our target:

-- my_model.sql
select * from {{ source("mysql_replica", "mysql_table") }}

You can also specify a "custom" database in the model config (docs). Unlike custom schemas, which have some complex behavior, models with custom databases just use the database you specify. So in another model, we can specify the database in a config block in the model file:

-- my_prod_model.sql
{{ config(database="PROD") }}
select * from {{ ref('my_model') }}

If you prefer (in recent versions of dbt), you can use the config: key in a properties file instead:

# my_prod_model.yml
version: 2
models:
  - name: my_prod_model
    config:
      database: PROD

Or you can configure a custom database for a single model, or directory of models, in your dbt_project.yml file:

# dbt_project.yml
profile: my-snowflake-db
# other stuff up here
...
models:
  # materialize models in the prod directory as tables in PROD database
  prod:
    +database: PROD
    +materialized: table

FINALLY, you can use "dev" schemas in your databases to separate dev from prod, but if you don't want any dev runs of dbt to write to your PROD database, you can use a little jinja in your .yml files:

# dbt_project.yml
profile: my-snowflake-db
# other stuff up here
...
models:
  # materialize models in the prod directory as tables in PROD database
  # if target is prod; otherwise, write to the DEV database
  prod:
    +database: "{{ 'PROD' if target.name == 'prod' else 'DEV' }}"
    +materialized: table
tconbeer
  • 4,570
  • 1
  • 9
  • 21
  • 3
    this is a fantastic answer! just to add, because the author tagged/mentioned Prefect is that you could leverage different custom blocks to point to different profiles, as described in this blog post series https://medium.com/the-prefect-blog/how-to-build-a-modular-data-stack-data-platform-with-prefect-dbt-and-snowflake-89f928974e85 – Anna Geller Jan 18 '23 at 19:43
  • 1
    Thanks, @AnnaGeller. I missed that this question was about Prefect. – tconbeer Jan 18 '23 at 22:07
  • you never know, prefect was only mentioned in the tag :D your response is excellent – Anna Geller Jan 19 '23 at 00:23
  • 1
    Thank you so much @tconbeer I think this answer helped me understand the dbt sources and project variables more clearly. I think I should elaborate on what I am trying to achieve here. Yes, I am using prefect to create the data pipeline, which runs the dbt model of the format { select count(col) from ref{{analytics_table}} } this table stored in PROD database, analytics_table model uses RAW database. prefect library is allowing me to use just 2 db variables source and target, if I change target to PROD, my analytics_table model throws an error. – user21038394 Jan 20 '23 at 19:44