0

Here's the problem I am trying to solve and run into issues with XCom Args The general idea is to have a pipeline that's extracting data from postgres databases and dumping them into BigQuery.

The pipeline is composed of the DAG linked together via a Dataset.

I have a DAG that inspects a list of Postgres databases identifies all tables, with columns and types and dumps that information into a GCS bucket. This updates the Dataset.

The downstream DAG is triggered by the Dataset update.

As a first step, this DAG loads the metadata from GCS: get_export_metadata that returns a dict with the shape {"databases":Dict[str, List[str]], "schemas": Dict[str, List[Dict[str, str]]]} example:

{
    "databases": {
        "test": [
            "table_1",
            "table_2"
        ]
    },
    "schemas": {
        "table_1": [
            {
                "mode": "NULLABLE",
                "name": "id",
                "type": "STRING"
            },
            {
                "mode": "NULLABLE",
                "name": "creation_date",
                "type": "STRING"
            },
        ]
    }
}

My DAG looks something like:

export_metadata = get_export_metadata()
# get_table_names is a task that does export_metadata["databases"]["test"]
table_names = get_table_names(export_metadata["databases"], "test") 

@task
def build_args(schema_map, table_name, instance_name, instance_pool, db_name):
        return {
            "instance_name": instance_name,
            "instance_pool": instance_pool,
            "schema_fields": schema_map[table_name],
            "db_name": db_name,
            "table_name": table_name,
        }

kw_args_list = build_args.partial(
          schema_map=export_metadata["schemas"],
          instance_name=instance_name,
          instance_pool=instance_pool,
          db_name=db_name
 ).expand(table_name=table_names)

extract_data_csv_dump.expand_kwargs(kw_args_list)

Buts this does not work, I get the following error: TypeError: Object of type MappedArgument is not JSON serializable

The object's repr is MappedArgument(_input=ListOfDictsExpandInput(value=XComArg(<Mapped(_PythonDecoratedOperator): build_args>)), _key='schema_fields')

Can somebody explain if there's a way to achieve what I am trying to do? I tried a few different things, but it usually fails because of a problem with trying to access export_metadata["schemas"][table_name]

QuantumLicht
  • 2,103
  • 3
  • 23
  • 32

0 Answers0