I am confused about the interpretation of the minimum description length of an alphabet of two symbols.
To be more concrete, suppose that we want to encode a binary string where 1's occur with probability 0.80; for instance, here is a string of length 40, with 32 1's and 8 0's:
1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 0 0 1
Following standard MDL analysis, we can encode this string using prefix codes (like Huffman's) and the code of encoding this string would be (-log(0.8) * 32 - log(0.2) * 8), which is lower than duplicating the string without any encoding.
Intuitively, it is "cheaper" to encode this string than some string where 1's and 0's occur with equal probability. However, in practice, I don't see why this would be the case. At the very least, we need one bit to distinguish between 1's and 0's. I don't see how prefix codes could do better than just writing the binary string without encoding.
Can someone help me clarify this, please?