It depends on how your data look like. Are there repeating entries, or do they change only slowly, or what?
If so, you can try to compress chunks of your data and decompress when needed. The bigger the chunks, the more memory you can save and the worse the speed. IMHO no good deal. You could also save the data compressed and decompress in memory.
Otherwise, i.e., in case of no regularities, you'll need at least log(9) / log(2) = 3.17
bits per entry and there's nothing what could improve it.
You can come pretty close to this value by packing 5 numbers into a short
. As 9**5 = 59049 < 65536 = 2**16
, it fits nearly perfectly. You'll need 3.2
bits per number, no big win. Packing of five number is given via this formula
a + 9 * (b + 9 * (c + 9 * (d + 9 * e)))
and unpacking is trivial via a precomputed table.
UPDATE after question update
Further information: I just did a little experimenting, and it turns out, that on average 2.14 consecutive numbers have the same value. There are 1 zero, 154 ones, 10373 twos, 385990 threes, 8146188 fours, 85008968 fives, 265638366 sixes, 70791576 sevens and 80 eights. So more than half of the numbers are 6s.
The fact that there are on the average about 2.14 consecutive numbers are the same could lead to some compression, but in this case it says us nothing. There are nearly only fives and sixes, so encountering two equal consecutive numbers seems to be implied.
Given this facts, you can forget my above optimization. There are practically only 8 values there as you can treat the single zero separately. So you need just 3 bits per value and a single index for the zero.
You can even create a HashMap
for all values below four or above seven, store there 1+154+10373+385990+80 entries and use only 2 bits per value. But this is still far from ideal.
Assuming no regularities, you'd need 1.44 bit per value as this is the entropy. You could go over all 5-tuples, compute their histogram, and use 1 byte for encoding of the 255 most frequent tuples. All the remaining tuples would map to the 256th value, telling you that you have to look in a HashMap
for the rare tuple value.
Some evaluation
I was curious if it can work. The packing of 5 numbers into one byte needs 85996340 bytes. There are nearly 5 million tuples which don't fit, so my idea was to use a hash map for them. Assuming rehashing rather than chaining it makes sense to keep it maybe 50% full, so we need 10 million entries. Assuming TIntShortHashMap (mapping indexes to tuples) each entry takes 6 bytes, leading to 60 MB. Too bad.
Packing only 4 numbers into one byte consumes 107495425 bytes and leaves 159531 tuples which don't fit. This looks better, however, I'm sure the denser packing could be improved a lot.
The results as produced by this little program:
*** Packing 5 numbers in a byte. ***
Normal packed size: 85996340.
Number of tuples in need of special handling: 4813535.
*** Packing 4 numbers in a byte. ***
Normal packed size: 107495425.
Number of tuples in need of special handling: 159531.