Lets say I have a .proto structured (simplified) like this
Message DataItem {
required string name = 1;
required int32 value = 2;
}
Message DataItemStream {
repeated DataItem items = 1;
}
The server will make the DataItemStream
and write it to disk. We load this file and everything is happy without issue.
This worked pretty well for us but our client base has grown and so has the use of the software that generates the streams of files.
The problem arises as the repeated items
field can have 10's of thousands of items but we're only interested in a subset of them. We've dug around a little bit and have only seen solutions that follow google's streaming advice (to add a size prefix to our stored DataItem
s and then parse each message individually OR to use a CodedInputStream
/CodedOutputStream
or to encode the binary wire format(base64) and separate by newline, then we'd be able to very easily get just the subsets we're interested in.
Any of these would work for us but require some changes in production code to change the way the files are saved (server based code that hasn't been changed in a long time and is deemed virtually untouchable by their management(in their minds, don't fix it if it isn't broken)...)
We've already re-created the module for the server that streams the messages differently, but are receiving flak from those maintainers about pushing our changes. It's much easier(politically) for us to change our code as needed as we have full control over its development cycle.
Is there a way to still use this original stream of messages but be intelligent on only picking subsets of messages to load? (we really do not care what language we have to work in if that matters, we have experience in c++, python, java and .NET (in that order of experience))