2

I am writing a gRPC application which exposes an API to do schema validation. Client applications are allowed to send any object for validation. Since schema validations is needed, any encoding used to get data should preserve original object structure and data types.

Accepting object as byte array ([]byte) will not preserve this data unless the object is encoded using some encoding format like proto, avro etc. Using these formats requires access to the object definitions (messages in protos, avro schema file for avro etc.) How can this be done for objects which are not known at application startup?

I was thinking of following strategies:

Approach 1

  1. For any new proto, Client generates a proto file, compiles using protoc and sends the .proto file to server. Using .pb in their respective language will keep clients independent of server language.
  2. Server uses protoc to generate .pb file for itself. Server stores this new .pb file locally.
  3. API is exposed via proto, and accepts data as Any (google/protobuf/any.proto).
    message Document {
        DocumentMeta meta = 1;
        google.protobuf.Any data = 2;
    }
  1. Clients will send a message encoded as byte array into this Any field.

Approach 2

  1. Accept data as simple json encoded []byte. This will not retain any type information.
  2. In Server, I can unmarshal the []byte to map[interface]interface{}
  3. Do a traversal of this map and match values with the stored schema of object.
  4. Object schema would be stored as proto encoding.

Now the challenges:

  1. In above approach- how will server use the newly stored .pb file to unmarshal the message from type Any to a specific type? I want to keep this dynamic and do not want to make code changes every time a new object type is introduced.
  2. Any general recommendation on how to accept data from clients when you want to retain or somehow be able to derive schema of original object.
uzumas
  • 632
  • 1
  • 8
  • 23
  • Can you please explain how your "schema validation" will work? (an example might help) Generally in order to validate data you will need some understanding of it's structure (in approach 2 you mention a "stored schema"). Does your validation code really require the data in its original structure or would a map from field name -> data (which could be one of many type) suffice? – Brits Aug 14 '22 at 03:14
  • You can consider validations like done by Confluent Schema Registry. You submit a java class/proto message and it stores the schema. Now when you publish any message for that object type, it compares what you have submitted against what is the expected structure. In 2nd approach, I'll use a proto file submitted by client and compare keys of byte[] to this message from .pb file. – uzumas Aug 15 '22 at 14:59

0 Answers0