I have a topic with Avro format and I want to read it as a stream in Pyspark but the output is null. my data is like this:
{
"ID": 559,
"DueDate": 1676362642000,
"Number": 1,
"__deleted": "false"
}
and the schema in the schema registry is:
{
"type": "record",
"name": "Value",
"namespace": "test",
"fields": [
{
"name": "ID",
"type": "long"
},
{
"name": "DueDate",
"type": {
"type": "long",
"connect.version": 1,
"connect.name": "io.debezium.time.Timestamp"
}
},
{
"name": "Number",
"type": "long"
},
{
"name": "StartDate",
"type": [
"null",
{
"type": "long",
"connect.version": 1,
"connect.name": "io.debezium.time.Timestamp"
}
],
"default": null
},
{
"name": "__deleted",
"type": [
"null",
"string"
],
"default": null
}
],
"connect.name": "test.Value"
}
and the schema in Pyspark I defined is:
schema = StructType([
StructField("ID",LongType(),False),
StructField("DueDate",LongType(),False),
StructField("Number",LongType(),False),
StructField("StartDate",LongType(),True),
StructField("__deleted",StringType(),True)
])
and in the result, I see null Dataframe
I expect the value in the Dataframe same as the records in Kafka topic but all columns are null