How to Fix Cached Schema Registry lookup causing terrible performance

Question

Edit: I discovered this other question from a few years back (How to populate the cache in CachedSchemaRegistryClient without making a call to register a new schema?). It mentions that the CachedSchemaRegistryClient needs to register the schema to the actual registry to make it cached, and there has been no solution yet to work around this. So leaving my question here, but wanted that to be made aware as well.

I am working on a program that is pulling a byte array from kafka, decrypting it (so it is secure while on kafka), converting the bytes to a string, the json string to json object, looking up the schema from the schema registry (utilizing CachedSchemaRegistryClient), converting the json bytes to a generic record using the schema from the retrieved schema from the registry metadata, and then serializing that generic record into avro bytes.

After running some tests it seems that the CachedSchemaRegistyClient is the major performance drain. But from what I can tell this is the best way to go about getting the schema metadata. Have I implemented something poorly or is there some other way that this can be done that works with my use case?

Here is the code for what handles everything after the decrypting:

package org.apache.flink;

import avro.fullNested.FinalMessage;
import io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient;
import io.confluent.kafka.schemaregistry.client.SchemaMetadata;
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericRecord;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.util.Collector;
import org.json.simple.JSONObject;
import org.json.simple.parser.JSONParser;
import serializers.AvroFinishedMessageSerializer;
import tech.allegro.schema.json2avro.converter.JsonAvroConverter;

public class JsonToAvroBytesParser implements FlatMapFunction<String, byte[]> {

    private transient CachedSchemaRegistryClient schemaRegistryClient;
    private transient AvroFinishedMessageSerializer avroFinishedMessageSerializer;
    private String schemaUrl;
    private Integer identityMaxCount;

    public JsonToAvroBytesParser(String passedSchemaUrl, int passedImc){
        schemaUrl = passedSchemaUrl;
        identityMaxCount = passedImc;
    }

    private void ensureInitialized() {
        if (schemaUrl.equals("")) {
            schemaUrl = "https://myschemaurl.com/";
        }
        if(identityMaxCount == null){
            identityMaxCount = 5;
        }
        if(schemaRegistryClient == null){
            schemaRegistryClient = new CachedSchemaRegistryClient(schemaUrl, identityMaxCount);
        }
        if(avroFinalMessageSerializer == null){
            avroFinalMessageSerializer = new AvroFinalMessageSerializer(FinalMessage.class);
        }
    }

    @Override
    public void flatMap(String s, Collector<byte[]> collector) throws Exception {

        ensureInitialized();

        Object obj = new JSONParser().parse(s);
        JSONObject jsonObject = (JSONObject) obj;

        try {
            String headers = jsonObject.get("headers").toString();
            JSONObject body = (JSONObject) jsonObject.get("requestBody");
            if(headers != null && body != null){
                String kafkaTopicFromHeaders = "hard_coded_name-value";
                //NOTE: this schema lookup has serious performance issues.
                SchemaMetadata schemaMetadata = schemaRegistryClient.getLatestSchemaMetadata(kafkaTopicFromHeaders);
                //TODO: need to implement recovery method if schema cannot be reached.

                JsonAvroConverter converter = new JsonAvroConverter();
                GenericRecord specificRecord = converter.convertToGenericDataRecord(body.toJSONString().getBytes(), new Schema.Parser().parse(schemaMetadata.getSchema()));
                byte[] bytesToReturn = avroFinishedMessageSerializer.serializeWithSchemaId(schemaMetadata, specificRecord);

                collector.collect(bytesToReturn);
            }
            else {
                System.out.println("json is incorrect.");
            }
        } catch (Exception e){
            System.out.println("json conversion exception caught");
        }
    }
}

Thanks for any help in advance!

score 0 · Answer 1 · answered May 05 '20 at 13:13

It appears the getLatestSchemaMetadata method does not use the cache. If you want your calls to use the cache to improve performance perhaps you can reorganize your program to use one of the other methods that does use the cache, perhaps lookup schema by ID or register schema by name with definition string.

I'm having trouble locating the documentation for Java (or Python, or C++) that confirms this is how SchemaRegistry works (tried here). But the .Net docs says at least in that client API the getLatest method is not cached.

How to Fix Cached Schema Registry lookup causing terrible performance

1 Answers1