0

I'm streaming topic with Kafka_2.12-3.0.0 on Ubuntu in standalone mode to PosgreSQL and getting deserialization error.

Using confluent_kafka from pip package to produce kafka stream in python (works ok):

{"pmu_id": 2, "time": 1644329854.08, "stream_id": 2, "stat": "ok", "ph_i1_r": 27.682000117654074, "ph_i1_j": -1.546410917622178, "ph_i2_r": 25.055846468243697, "ph_i2_j": 2.6658974347348012, "ph_i3_r": 25.470616978816988, "ph_i3_j": 0.5585993153435624, "ph_v4_r": 3338.6901623241415, "ph_v4_j": -1.6109426103444193, "ph_v5_r": 3149.0595421490525, "ph_v5_j": 2.5863594222073076, "ph_v6_r": 3071.4231229187553, "ph_v6_j": 0.4872377558335442, "ph_7_r": 0.0, "ph_7_j": 0.0, "ph_8_r": 3186.040175515683, "ph_8_j": -1.6065850592620299, "analog": [], "digital": 0, "frequency": 50.014, "rocof": 1}

Configuration for storing in PostgreSQL

In my kafka_2.12-3.0.0/config/connect-standalone.properties I've added connector and converter:

plugin.path=/home/user/kafkaConnectors/confluentinc-kafka-connect-jdbc-10.3.2,/home/user/kafkaConverters/confluentinc-kafka-connect-json-schema-converter-7.0.1

I'm executing with:

bin/connect-standalone.sh config/connect-standalone.properties config/sink-postgres.properties

My full config/sink-postgres.properties :

name=sinkIRIpostgre
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
connection.url=jdbc:postgresql://localhost:5432/pgdb
topics=pmu1
key.converter=io.confluent.connect.json.JsonSchemaConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.json.JsonSchemaConverter
value.converter.schema.registry.url=http://localhost:8081
connection.user=pguser
connection.password=pgpass
auto.create=true
auto.evolve=true
insert.mode=insert
pk.mode=record_key
pk.fields=MESSAGE_KEY

Getting error:

ERROR [sinkIRIpostgre|task-0] WorkerSinkTask{id=sinkIRIpostgre-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:193)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:206)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:132)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:493)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:473)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:328)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:186)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:241)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed due to serialization error of topic pmu214:
        at io.confluent.connect.json.JsonSchemaConverter.toConnectData(JsonSchemaConverter.java:119)
        at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:87)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.convertKey(WorkerSinkTask.java:530)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:493)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:156)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:190)
        ... 13 more
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing JSON message for id -1
        at io.confluent.kafka.serializers.json.AbstractKafkaJsonSchemaDeserializer.deserialize(AbstractKafkaJsonSchemaDeserializer.java:177)
        at io.confluent.kafka.serializers.json.AbstractKafkaJsonSchemaDeserializer.deserializeWithSchemaAndVersion(AbstractKafkaJsonSchemaDeserializer.java:235)
        at io.confluent.connect.json.JsonSchemaConverter$Deserializer.deserialize(JsonSchemaConverter.java:165)
        at io.confluent.connect.json.JsonSchemaConverter.toConnectData(JsonSchemaConverter.java:108)
        ... 18 more
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
        at io.confluent.kafka.serializers.AbstractKafkaSchemaSerDe.getByteBuffer(AbstractKafkaSchemaSerDe.java:250)
        at io.confluent.kafka.serializers.json.AbstractKafkaJsonSchemaDeserializer.deserialize(AbstractKafkaJsonSchemaDeserializer.java:112)

EDIT (Python code)

Here is python code used for generating kafka producer:

from confluent_kafka import Producer
..
p = Producer({'bootstrap.servers': self.kafka_bootstrap_servers})
...
record_key = str(uuid.uuid4())
record_value = self.createKafkaJSON(base_message)
p.produce(self.kafka_topic, key=record_key, value=record_value)
                p.poll(0)

function createKafkaJSON is returning json.dumps(kafkaDictFinal).encode('utf-8') where is kafkaDictFinal is Python dictionary.

Producer is called in main with:

  KafkaPMUProducer(pdc_id=2, pmu_ip="x.x.x.x", pmu_port=4712, kafka_bootstrap_servers ="localhost:9092", kafka_topic="pmu214").kafka_producer()
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Hrvoje
  • 13,566
  • 7
  • 90
  • 104

1 Answers1

2

If you're writing straight JSON from your Python app then you'll need to use the org.apache.kafka.connect.json.JsonConverter converter, but your messages will need a schema and payload attribute.

io.confluent.connect.json.JsonSchemaConverter relies on the Schema Registry wire format which includes a "magic byte" (hence the error).

You can learn more in this deep-dive article about serialisation and Kafka Connect, and see how Python can produce JSON data with a schema using SerializingProducer

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Robin Moffatt
  • 30,382
  • 3
  • 65
  • 92
  • Thanks, only thing what worries me is time in milliseconds: `"time": 1644329854.08`. Should I just convert it to something PostgeSQL supports and mark it 'timestamp'? I should be able to do calculations and alerting based on it latter on in ksql. It's not clear to me what data types are supported with JSON Schema? I see types clearly defined for Avro but I'm not finding same thing for JSON Schema. I assume since JSON is text dict thats why there is not clear spec so I can use what is supported on other end (PostgreSQL) and it should work? – Hrvoje Feb 09 '22 at 16:03
  • 1
    I'd suggest you post that as a new question, either here or at https://forum.confluent.io/ – Robin Moffatt Feb 09 '22 at 16:35
  • 1
    @Hrvoje The valid types spec is here https://json-schema.org/draft/2020-12/json-schema-validation.html#rfc.section.6.1.1 – OneCricketeer Feb 09 '22 at 19:29
  • You might wanna change avro producer example: https://github.com/confluentinc/confluent-kafka-python/blob/master/examples/avro_producer.py according to this bug: https://github.com/confluentinc/confluent-kafka-python/issues/1078. `schema_registry_client` and `schema_str` should switch places in function call. – Hrvoje Feb 10 '22 at 16:50
  • @Hrvoje thanks, I'll check it out – Robin Moffatt Feb 13 '22 at 21:57