0

I tried the following code, but results from running DLT pipeline in an error:

if kwargs.get("df_tableoperation", None) is not None : ^ SyntaxError: invalid syntax

The idea is to dynamically generate a series of tables from configurable metadata tables. Where I would ultimately keep rule sets that are applied conditionally.

I cannot picture how to accomplish this, would I need to explicitly declare all of the @dlt.expect_<*> in the framework and pass in empty dicts if I did not want to apply any rules?

def outer_create_bz_table( **kwargs )

    @dlt.table(name = bz_tablename, comment="test")
    

    if kwargs.get("df_tableoperation", None) is not None :

        @dlt.expect_all_or_drop({"Null Location" :  "LOCATION is null"}) 

I was expecting this to run as a DLT pipeline.

Ultimately I want to dynamically call any number of @dlt.expect_<suffix> to operate on any number of columns for respective tables generated with the outer_create_bz_table(). I got some confirmation of a generalized framework being feasible when I tried the same dynamic calling of if kwargs.get() : dlt.apply_changes(someDynamicArg). The only differences between that and with the above approach are two:

  1. Expectations make use of the @ decorator
  2. Expectations occur before the function def that returns a df

(Edit) Looked more into the idea of decorators, I've also tried the following. But results were still a syntax error but pointing out x = a_function_to_decorate()

def expect_decor(a_function_to_decorate):

    def the_wrapper_around_the_original_function():

        @dlt.expect_all_or_drop({"Null Location" :  "LOCATION is null"}) 
        x = a_function_to_decorate()
        return x
    
    return the_wrapper_around_the_original_function

# And then 

def outer_create_bz_table( **kwargs )

    @dlt.table(name = bz_tablename, comment="test")
    @expect_decor
    def : return df # some df

I do not see any reference to accomplishing this in the documentation on multiple expectations:

https://learn.microsoft.com/en-us/azure/databricks/delta-live-tables/expectations#--multiple-expectations

1 Answers1

0

The error is because you can't give any statements after dlt.table decorator. You need to give the def functions which returns dataframe. But to achieve your task conditionally you can use one of the below ways.

Create the dlt table inside if condition.

import pandas as pd
import dlt
    
def outer_create_bz_table( **kwargs ):
    if kwargs.get("df_tableoperation", None) is  not  None:
         @dlt.table(name = "pd_data", comment="test")
         @dlt.expect_all_or_drop({"Null Location" : "LOCATION is null"})
         def get_sample():
              dt = [
                  {
                      "LOCATION":"Bengaluru",
                      "id":"123"
                   },
                   {
                       "LOCATION":None,
                       "id":"456"
                    }
                ]
               return spark.createDataFrame(pd.DataFrame(dt))

and call the function.

outer_create_bz_table(df_tableoperation="yes")

enter image description here

After run pipeline below is the output. enter image description here

Or

you create dictionary of rules based on condition and pass that dictionary object to except_all function.

import pandas as pd
import dlt

def outer_create_bz_table( **kwargs ):
    rules={}
    if kwargs.get("df_tableoperation", None) is  not  None:
        rules["Null Location"]="LOCATION is null"
        
    @dlt.table(name = "pd_data", comment="test")
    @dlt.expect_all_or_drop(rules)
    
    def get_sample():
        dt = [
                    {
                    "LOCATION":"Bengaluru",
                    "id":"123"
                    },
                    {
                    "LOCATION":None,
                    "id":"456"
                    }
                ]
        return spark.createDataFrame(pd.DataFrame(dt))

Code in notebook.

enter image description here

Output;

enter image description here

JayashankarGS
  • 1,501
  • 2
  • 2
  • 6
  • Thanks for the response. I can see the first option working out, although it would lead to repetitive coding. Where you have get_sample(), I would want to execute this under all circumstances (df_tableop is None or is Not None)? So I would need to repeatedly paste in get_sample(), or have get_sample() call another function that defined in an external module. This become even more of an issue when I think about extended this idea to dynamically configure between expect_all, expect_all_or_drop, expect_all_or_fail... – stewardCopeland Aug 23 '23 at 12:46
  • Also, regarding your comment "The error is because you can't give any statements after dlt.table decorator. You need to give the def functions which returns dataframe." Is this coming from your knowledge of decorators in python? Maybe my question is more broad than databricks... essentially I want to dynamically add in another layer. E.g. conditionally add in "@decorator_b" between "@decorator_a" and def. I will look for any SO posts regarding that. – stewardCopeland Aug 23 '23 at 12:50
  • I agree with you when using first option under all circumstances.Have you tried second option? – JayashankarGS Aug 23 '23 at 12:58
  • There you need to create rules based on conditions and pass those rules in except functions. – JayashankarGS Aug 23 '23 at 12:59
  • After thinking about this further, I might need to rephrase my question.... For a given generic table, It is not Ok to preclude using expect_all_or_drop() if I use expect_all(). E.g. there could be a Table_A that has some cols evaluated as _or_drop, while other cols/rules evalute to _or_fail... @Jayashankargs Your suggestion gets very burdensome in that situation. – stewardCopeland Aug 23 '23 at 16:56
  • Can you provide your scenario in details.It's seems the second option given which is creating rules based on conditions will work properly. – JayashankarGS Aug 24 '23 at 12:38
  • Do you know, will expect_all_* work even if the dict is an empty dict? – stewardCopeland Aug 24 '23 at 16:09
  • Yes. If you pass {} to ecpect_all_* it works. I have tried – JayashankarGS Aug 24 '23 at 16:27