I hope this isn't too opinionated for SO; it may not have a good answer.
In a portion of a library I'm writing, I have a byte array that gets populated with values supplied by the user. These values might be of type Float, Double, Int (of different sizes), etc. with binary representations you might expect from C, say. This is all we can say about the values.
I have an opportunity for an optimization: I can initialize my byte array with the byte MAGIC
, and then whenever no byte of the user-supplied value is equal to MAGIC
I can take a fast path, otherwise I need to take the slow path.
So my question is: what is a principled way to go about choosing my magic byte, such that it will be reasonably likely not to appear in the (variously-encoded and distributed) data I receive?
Part of my question, I suppose, is whether there's something like a Benford's law that can tell me something about the distribution of bytes in many sorts of data.