1

I am trying to fetch some data in Airflow using DatabricksSqlOperator from a Databricks delta tables using :

select = DatabricksSqlOperator(
    databricks_conn_id=databricks_id,
    http_path=http_path,
    task_id="select_data",
    sql="select * from schema.table_name",
)

Here the connection is getting established fine but getting an error while fetching the data :

[2023-05-08, 18:32:43 UTC] {xcom.py:599} ERROR - Could not serialize the XCom value into JSON. If you are using pickle instead of JSON for XCom, then you need to enable pickle support for XCom in your *** config.
[2023-05-08, 12:44:02 UTC] {taskinstance.py:1851} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/session.py", line 72, in wrapper
    return func(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 2378, in xcom_push
    XCom.set(
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/session.py", line 72, in wrapper
    return func(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/xcom.py", line 206, in set
    value = cls.serialize_value(
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/xcom.py", line 597, in serialize_value
    return json.dumps(value).encode('UTF-8')
  File "/usr/local/lib/python3.8/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/local/lib/python3.8/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/local/lib/python3.8/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/local/lib/python3.8/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type datetime is not JSON serializable

Here one of the column in the table is of date format, 2 columns are in timestamp format. I tried this on a table without any date column , it worked fine. Does anyone know what I am doing wrong?

vaibhav
  • 45
  • 6

1 Answers1

0

First the default JSON serializer does not support serializing datetime objects.

First we will create a special JSON encoder

import json
import datetime

class CustomJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (datetime.date, datetime.datetime)):
            return obj.isoformat()
        return super().default(obj)

Next we patch the airflow.models.xcom.XCom.serialize_value for our encoder

from airflow.models.xcom import XCom

def custom_serialize_value(value):
    return json.dumps(value, cls=CustomJSONEncoder).encode('UTF-8')

XCom.serialize_value = staticmethod(custom_serialize_value)

Here is your full code, replace your_databricks_connection_id and your_http_path.

import json
import datetime
from airflow import DAG
from airflow.providers.databricks.operators.databricks_sql import DatabricksSqlOperator
from airflow.models.xcom import XCom
from airflow.utils.dates import days_ago

class CustomJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (datetime.date, datetime.datetime)):
            return obj.isoformat()
        return super().default(obj)

def custom_serialize_value(value):
    return json.dumps(value, cls=CustomJSONEncoder).encode('UTF-8')

XCom.serialize_value = staticmethod(custom_serialize_value)

databricks_id = 'your_databricks_connection_id'
http_path = 'your_http_path'

default_args = {
    'owner': 'airflow',
    'start_date': days_ago(1),
}

dag = DAG(
    'example_databricks_sql_operator',
    default_args=default_args,
    description='Example DAG using DatabricksSqlOperator',
    schedule_interval=None,
)

select = DatabricksSqlOperator(
    databricks_conn_id=databricks_id,
    http_path=http_path,
    task_id="select_data",
    sql="select * from schema.table_name",
    dag=dag,
)
Saxtheowl
  • 4,136
  • 5
  • 23
  • 32
  • Thanks for the answer but I am getting this error : `[2023-05-09, 08:31:54 UTC] {taskinstance.py:1851} ERROR - Task failed with exception Traceback (most recent call last): File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/session.py", line 72, in wrapper return func(*args, **kwargs) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 2378, in xcom_push XCom.set( File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/session.py", line 72, in wrapper return func(*args, **kwargs) ` – vaibhav May 09 '23 at 08:33
  • `File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/xcom.py", line 206, in set value = cls.serialize_value( TypeError: custom_serialize_value() got an unexpected keyword argument 'key' ` – vaibhav May 09 '23 at 08:36