When running Pulsar in docker as standalone, we are facing this weird issue when deserializing the message in the specific case. We are using version 2.7.1.
We have a script creating topics and functions after which schema gets created for troublesome topic with type JSON. The whole schema is correct, but the type is not. This is all before sending any messages.
We also enabled set-is-allow-auto-update-schema
.
This, let's call it trouble-topic
, is populated from 2 sources: ValidationFunction
and a Spring Boot microservice.
ValidationFunction
validates the message and if the message is valid it sends the mapped message to a topic which is consumed by Spring Boot microservice which then does some logic on it and sends it to trouble-topic
, but if validation fails it sends message directly to trouble-topic
.
When using sendAsync
from Spring Boot microservice with the following producer, schema gets updated, has AVRO as a type, and TroubleFunction
reading the trouble-topic
works fine afterwards:
pulsarClient
.newProducer(AvroSchema.of(OurClass.class))
.topic(troubleTopicName))
.create()
But if before that some messages fail validation, and the messages are sent directly to the trouble-topic
before the above Producer is used, we get a parsing exception. We send the message from function in the following way:
context.newOutputMessage(troubleTopicName, AvroSchema.of(OurClass.class))
.value(value)
.sendAsync();
This does not update the schema type for some reason and the schema type is still JSON. I validated schema type on each of the steps using pulsar admin CLI. And when this happens before the microservice producer updates the schema type for the first time, TroubleFunction
reading the trouble-topic
fails with the following error:
11:43:49.322 [tenant/namespace/TroubleFunction-0] ERROR org.apache.pulsar.functions.instance.JavaInstanceRunnable - [tenant/namespace/TroubleFunction:0] Uncaught exception in Java Instance
org.apache.pulsar.client.api.SchemaSerializationException: com.fasterxml.jackson.core.JsonParseException: Illegal character ((CTRL-CHAR, code 2)): only regular white space (\r, \n, \t) is allowed between tokens
at [Source: (byte[])avro-serialized-msg-i-have-to-hide Parsing exception: cvc-complex-type.2.4.a: Invalid content was found starting with element 'ElementName'. One of '{"foo:bar":ElementName}' is expected."; line: 1, column: 2]
So my question is what is the difference between these two, and why sending the message from function does not update the schema type correctly? Is it not using the same Producer underneath? Also is there a way to fix this so that schema type is set on initialization or at least updated when the message is sent from a function?