0

Is it possible to build seed dataset/table over multiple files in DBT?

I have two data files like below in my dbt project

enter image description here

Building seed dataset/table on individual file works perfectly fine.

However, what I am looking for is to create one seed dataset/table locations which should have the combined data from both the files.

1 Answers1

0

No, you can't do this directly using dbt seed. The easiest approach is to keep them separate seed files resulting in 2 source tables, then just create a model that combines them. The model is as simple as

{{ dbt_utils.union_relations(relations=[
    source('locations', 'locations_1'), 
    source('locations', 'locations_2')
]) }}
Adam Kipnis
  • 10,175
  • 10
  • 35
  • 48
  • Thanks Adam! The thing is, in my actual scenario I have around 200 static files which all have the same structure, and I don't want to end up creating 200 different tables and keep editing the code whenever a new file is added. – Pravin Singh Aug 10 '23 at 08:37
  • You can code it to loop through the sources, but your first point about not wanting 200 tables makes sense. I'm also assuming that these files aren't completely static and/or the number of files changes frequently enough so that you can't simply merge them into a single file beforehand and use that as the seed? If not, then I would recommend doing the ingestion outside of DBT. Pretty much any destination can trigger a write once a file is written to cloud storage (ie, write to S3 -> trigger Snowpipe to write to Snowflake if using AWS/Snowflake). – Adam Kipnis Aug 10 '23 at 16:12
  • The files themselves are pretty small, two columns with 10-20 rows and rarely need to be changed. Cannot set it up as ingestion as its a manual file sent out by business once or twice a month. The best solution then for us would be to keep these files the ADLS storage and build a external table on top of that location in Databricks which we are already doing for properly ingested data from different source systems. – Pravin Singh Aug 11 '23 at 07:16