I am writing a gRPC application which exposes an API to do schema validation. Client applications are allowed to send any object for validation. Since schema validations is needed, any encoding used to get data should preserve original object structure and data types.
Accepting object as byte array ([]byte) will not preserve this data unless the object is encoded using some encoding format like proto, avro etc. Using these formats requires access to the object definitions (messages in protos, avro schema file for avro etc.) How can this be done for objects which are not known at application startup?
I was thinking of following strategies:
Approach 1
- For any new proto, Client generates a proto file, compiles using protoc and sends the .proto file to server. Using .pb in their respective language will keep clients independent of server language.
- Server uses protoc to generate .pb file for itself. Server stores this new .pb file locally.
- API is exposed via proto, and accepts data as
Any
(google/protobuf/any.proto)
.
message Document {
DocumentMeta meta = 1;
google.protobuf.Any data = 2;
}
- Clients will send a message encoded as byte array into this Any field.
Approach 2
- Accept data as simple json encoded []byte. This will not retain any type information.
- In Server, I can unmarshal the []byte to
map[interface]interface{}
- Do a traversal of this map and match values with the stored schema of object.
- Object schema would be stored as proto encoding.
Now the challenges:
- In above approach- how will server use the newly stored .pb file to unmarshal the message from type Any to a specific type? I want to keep this dynamic and do not want to make code changes every time a new object type is introduced.
- Any general recommendation on how to accept data from clients when you want to retain or somehow be able to derive schema of original object.