Here's the problem I am trying to solve and run into issues with XCom Args The general idea is to have a pipeline that's extracting data from postgres databases and dumping them into BigQuery.
The pipeline is composed of the DAG linked together via a Dataset.
I have a DAG that inspects a list of Postgres databases identifies all tables, with columns and types and dumps that information into a GCS bucket. This updates the Dataset.
The downstream DAG is triggered by the Dataset update.
As a first step, this DAG loads the metadata from GCS:
get_export_metadata that returns a dict with the shape {"databases":Dict[str, List[str]], "schemas": Dict[str, List[Dict[str, str]]]}
example:
{
"databases": {
"test": [
"table_1",
"table_2"
]
},
"schemas": {
"table_1": [
{
"mode": "NULLABLE",
"name": "id",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "creation_date",
"type": "STRING"
},
]
}
}
My DAG looks something like:
export_metadata = get_export_metadata()
# get_table_names is a task that does export_metadata["databases"]["test"]
table_names = get_table_names(export_metadata["databases"], "test")
@task
def build_args(schema_map, table_name, instance_name, instance_pool, db_name):
return {
"instance_name": instance_name,
"instance_pool": instance_pool,
"schema_fields": schema_map[table_name],
"db_name": db_name,
"table_name": table_name,
}
kw_args_list = build_args.partial(
schema_map=export_metadata["schemas"],
instance_name=instance_name,
instance_pool=instance_pool,
db_name=db_name
).expand(table_name=table_names)
extract_data_csv_dump.expand_kwargs(kw_args_list)
Buts this does not work, I get the following error:
TypeError: Object of type MappedArgument is not JSON serializable
The object's repr is MappedArgument(_input=ListOfDictsExpandInput(value=XComArg(<Mapped(_PythonDecoratedOperator): build_args>)), _key='schema_fields')
Can somebody explain if there's a way to achieve what I am trying to do?
I tried a few different things, but it usually fails because of a problem with trying to access export_metadata["schemas"][table_name]