2

I have a Java client consumer that is recieving Pulsar (v2.10.0) AVRO messages (Employees), like this:

import org.apache.pulsar.client.api.Consumer;
import org.apache.pulsar.client.api.Message;
import org.apache.pulsar.client.api.PulsarClient;
import org.apache.pulsar.client.api.PulsarClientException;
import org.apache.pulsar.client.api.Schema;
import example.Employee;

public class TestConsumer {

    public static void main(String[] args) throws PulsarClientException, InterruptedException {

        final String broker = "pulsar://localhost:6650";
        final String topic  = "persistent://public/default/avrotopic";

        PulsarClient client = PulsarClient.builder().serviceUrl(broker).build();
        Consumer<Employee> consumer = client.newConsumer(Schema.AVRO(Employee.class)).topic(topic)
                .subscriptionName("first-subscription")
                .subscribe();
        Message<Employee> message = consumer.receive();
        Employee employeeObj = message.getValue();

        System.out.println("Received Employee: " + employeeObj.getName() );

        consumer.acknowledge(message);
        consumer.close();
        client.close();
    }

}

The topics's AVRO schema is:

{
    "version": 0,
    "type": "AVRO",
    "timestamp": 0,
    "data": "{\"type\":\"record\",\"name\":\"Employee\",\"namespace\":\"example\",\"fields\":[{\"name\":\"name\",\"type\":\"string\"}]}",
    "properties": {
        "__jsr310ConversionEnabled": "false",
        "__alwaysAllowNull": "true"
    }
}

When producing messages via a corresponding Java client producer, all works fine: Messages get deserialized into Employee objects. Now I'm trying to get the same result when producing messages via Websocket API or REST API instead.


For Websocket API producer - I have tried:

ws://localhost:8080/ws/v2/producer/persistent/public/default/avrotopic

with message:

{
    "payload":"CEpvaG4="
}

"CEpvaG4=" is the base64 encoded AVRO binary data (name is "John").

The message is accepted and received by the consumer but throws an exception:

Exception in thread "main" org.apache.pulsar.shade.com.google.common.util.concurrent.UncheckedExecutionException: org.apache.pulsar.shade.org.apache.commons.lang3.SerializationException: Failed at fetching schema info for EMPTY at org.apache.pulsar.shade.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2050) at org.apache.pulsar.shade.com.google.common.cache.LocalCache.get(LocalCache.java:3951) at org.apache.pulsar.shade.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3973) at org.apache.pulsar.shade.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4957) at org.apache.pulsar.client.impl.schema.StructSchema.decode(StructSchema.java:107) at org.apache.pulsar.client.impl.MessageImpl.getValue(MessageImpl.java:301) at com.delti.esb.example.example_consumer.TestConsumer.main(TestConsumer.java:23) Caused by: org.apache.pulsar.shade.org.apache.commons.lang3.SerializationException: Failed at fetching schema info for EMPTY at org.apache.pulsar.client.impl.schema.StructSchema.getSchemaInfoByVersion(StructSchema.java:220) at org.apache.pulsar.client.impl.schema.AvroSchema.loadReader(AvroSchema.java:93) at org.apache.pulsar.client.impl.schema.StructSchema$1.load(StructSchema.java:75) at org.apache.pulsar.client.impl.schema.StructSchema$1.load(StructSchema.java:72) at org.apache.pulsar.shade.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527) at org.apache.pulsar.shade.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2276) at org.apache.pulsar.shade.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154) at org.apache.pulsar.shade.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044) ... 6 more

Since websocket API does not support AVRO schema registry according to the feature list I guess this is not suprising though.


For REST API producer - I have tried:

curl --location --request POST 'http://localhost:8080/topics/persistent/public/default/avrotopic' \
--header 'Content-Type: application/json' \
--data-raw '{
    "valueSchema":"{\"schema\":\"eyJuYW1lc3BhY2UiOiJleGFtcGxlIiwiZmllbGRzIjpbeyJuYW1lIjoibmFtZSIsInR5cGUiOiJzdHJpbmcifV0sInR5cGUiOiJyZWNvcmQiLCJuYW1lIjoiRW1wbG95ZWUifQ==\",\"properties\":{\"__jsr310ConversionEnabled\":\"false\",\"__alwaysAllowNull\":\"true\"},\"schemaDefinition\":\"{\\\"namespace\\\":\\\"example\\\",\\\"fields\\\":[{\\\"name\\\":\\\"name\\\",\\\"type\\\":\\\"string\\\"}],\\\"type\\\":\\\"record\\\",\\\"name\\\":\\\"Employee\\\"}\",\"name\":\"avrotopic\",\"type\":\"AVRO\"}",
    "messages":[
        {"payload":"CEpvaG4="}
    ]
}'

Response:

{
    "messagePublishResults": [
        {
            "messageId": "10:2:-1",
            "errorCode": 0,
            "schemaVersion": 0
        }
    ],
    "schemaVersion": 0
}

So the message is accepted and also received by the consumer but the payload always seems to be empty when consumed. I tried to get the request similar to the JSON example documented on https://pulsar.apache.org/docs/client-libraries-rest/ but I'm clearly missing something.


Is there any way to get this working?

If not I guess I have to send AVRO base64 without using schema registry and do the deserialization in the application..

MPleus
  • 21
  • 3

1 Answers1

2

Currently, there isn't a way to specify the schema when creating a WS producer/consumer.

The best option is to specify the AVRO schema on the topic itself and then set the schema compatibility setting for the topic as ALWAYS_COMPATIBLE.

This will allow the WS producer to publish the raw bytes (which are really in Avro format) to the topic. Then the Java Avro consumer will be able to deserialize Avro messages as expected.

David Kjerrumgaard
  • 1,056
  • 7
  • 10
  • Thanks you very much for your reply! When I send the WS message like {"payload": ""}, Pulsar complains about "Illegal unquoted character". When I'm sending raw bytes via REST API though, it's accepted and can be deserialized by the Java Consumer...BUT in this case all INT numbers > 63 are not deserialized correctly. – MPleus Jun 10 '22 at 09:50
  • Does the payload contain the raw bytes or is the AVRO base64 encoded? – David Kjerrumgaard Jun 11 '22 at 15:17
  • It's raw AVRO encoded bytes that went into payload - skipping base64 encode like you suggested – MPleus Jun 13 '22 at 06:35
  • I am actually suggesting that you try sending over the payload as base64 encoded bytes. This encoding is designed to make binary data survive transport through transport layers that are not 8-bit clean, which looks like it is the case since your issue is with int numbers > 63. – David Kjerrumgaard Jun 13 '22 at 20:30
  • 1
    Thanks for clarifying. So that would mean sending the message exactly like in my first example above but with using the ALWAYS_COMPATIBLE compatibility setting. I’m like 90% sure that I already tried this but I will confirm again as soon as I'm back in office. – MPleus Jun 15 '22 at 07:32
  • I can confirm now that the java client still throws the "Failed at fetching schema info..." exception. – MPleus Jun 27 '22 at 06:55