1

I am getting this error "AnalysisException: Cannot redefine dataset" in my DLT pipeline. I am using a for loop to trigger multiple flows. I am trying to load different sources into the same target using dlt.create_target_table and dlt.apply_changes. So my pipeline is trying to define the same target tables for different inputs.

My inputs are [{Source: src_A, Target: tgt},{Source: src_B, Target: tgt}]. As mentioned in the data bricks cook book, is Union the only choice to combine multiple sources into one target? Can anyone help on this one.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
BobGally
  • 11
  • 2

1 Answers1

0

Yes, in DLT there should be only a single target with the same name. If you have multiple sources writing into a single target, then you need to use union to combine the sources.

Programmatically it could be done as something like this:

def generate_target_table(tgt_name, sources):
  @dlt.table(name=tgt_name)
  def union_streams():
    source_tables = [dlt.read_stream(t) for t in sources]
    unioned = functools.reduce(lambda x,y: x.union(y), source_tables)
    return unioned

import functools as ft
import itertools as it
tables = [{"Source": "src_A", "Target": "tgt"},{"Source": "src_B", "Target": "tgt"}]
kf = lambda x: x.get("Target")
grouped = it.groupby(sorted(tables, key=kf), kf )
for tgt, lst in grouped:
  sources = list([i['Source'] for i in lst])
  generate_target_table(tgt, source)
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Was wondering if there's been any changes to this @Alex? My concern is if stream 1 has 10 new records, but stream 2 has 0, how does that work? Each stream would also contain only a partial set of fields that make up the target table. Would the target schema get overwritten? – Josh Aug 25 '23 at 20:48
  • You can do union by name with allow missing fields. But really, the flow api is coming that will simplify such things – Alex Ott Aug 26 '23 at 17:35
  • I haven't heard about this, is there something I can read about it and is there an ETA? Thanks for the responses. – Josh Aug 28 '23 at 14:50