-4

I have a function that compresses a given bitstream in Python, using Run Length Encoding (RLE). I want to now be able to decompress that compressed bitstream. This is my code to compress the bitstream.

def RLEC(inputstream):
    count = ""
    result = ""
    prev_char = ""
    for i in range(len(inputstream)):
        if inputstream[i] != prev_char:
            result = result + str(count) + prev_char
            prev_char = inputstream[i]
            count = 1
        else:
            count += 1
    else:
        result += str(count) + prev_char
        return result

If I compress the bitstream, for example 0111111111111111111111111110 would be compressed as 1026110. How would I be able to decompress that to give me my original answer? Or at least be able to split the numbers into sections, each telling how many bits I have and what bit it is? If this is wrong, what format should I use to maximise bitstream efficiency and be able to decompress/split into separate sections?

Some
  • 18
  • 5

1 Answers1

1

As pointed out in the comments, your format is fundamentally flawed, and cannot be uniquely decompressed. Your example 1026110 could be one zero, 26 ones, and one zero, or it could be 1026 ones and one zero. Or just reading sequentially, one zero, 261 ones, followed by something else. Or one zero and then 2611 zeros.

Since ones and zeros must alternate, you don't need to put those in the output at all. Instead consider terminating the decimal numbers with, say, a period to make them unambiguous. Then you would have 1.26.1., for one zero, 26 ones, and one zero. The convention would be to start with a zero, so if the stream starts with a one, the compressed version would start with a zero count. E.g. 0.26.1. for zero zeros, 26 ones, and one zero.

Lastly, what you are decompressing is in no way a "bitstream". It is a string of ASCII characters as decimal digits and terminators, as suggested. This is rather space inefficient (by a factor of at least 2.4), which is an issue since the intent is compression. You should instead design a true bitstream format, which encodes the run lengths in bits instead of decimal digits.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158