Consider we have an algorithm that receives a hypothetically long stream of keys. It then generates a value between 0 and 1 for each key, as we process it, for posterior retrieval. The input set is large enough that we can't afford to store one value for each key. The value-generating rule is independent across keys.
Now, assume that we can tolerate error in the posterior lookup, but we want to still minimize the difference in retrieved and original values (i.e. asymptotically over many random retrievals).
For example, if the original value for a given key was 0.008, retrieving 0.06 is much better than retrieving 0.6.
What data structures or algorithms can we use to address this problem?
Bloom filters are the closest data structure that I can think of. One could quantize the output range, use a bloom filter for each bucket, and somehow combine their output at retrieval time to estimate the most likely value. Before I proceed with this path and reinvent the wheel, are there any known data structures, algorithms, theoretical or practical approaches to address this problem?
I am ideally looking for a solution that can parameterize the tradeoff between space and error rates.