0

I have a topic with Avro format and I want to read it as a stream in Pyspark but the output is null. my data is like this:

{
  "ID": 559,
  "DueDate": 1676362642000,
  "Number": 1,
  "__deleted": "false"
}

and the schema in the schema registry is:

{
  "type": "record",
  "name": "Value",
  "namespace": "test",
  "fields": [
    {
      "name": "ID",
      "type": "long"
    },
    {
      "name": "DueDate",
      "type": {
        "type": "long",
        "connect.version": 1,
        "connect.name": "io.debezium.time.Timestamp"
      }
    },
    {
      "name": "Number",
      "type": "long"
    },
    {
      "name": "StartDate",
      "type": [
        "null",
        {
          "type": "long",
          "connect.version": 1,
          "connect.name": "io.debezium.time.Timestamp"
        }
      ],
      "default": null
    },
    {
      "name": "__deleted",
      "type": [
        "null",
        "string"
      ],
      "default": null
    }
  ],
  "connect.name": "test.Value"
}

and the schema in Pyspark I defined is:

schema = StructType([
StructField("ID",LongType(),False),
StructField("DueDate",LongType(),False),
StructField("Number",LongType(),False),
StructField("StartDate",LongType(),True),
StructField("__deleted",StringType(),True)
 ])

and in the result, I see null Dataframe

I expect the value in the Dataframe same as the records in Kafka topic but all columns are null

Anna b
  • 5
  • 3

0 Answers0