3

When implementing the Run-length encoding (RLE), can I assume that the Runs are going to be shorter than one byte?

So there will not be a situation where there is a run like this

WWWBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB...

Where there are 256 B's because you cannot represent that length in one byte whereas you can represent the W's as 3W

If not, should the Run be split into two Runs? How should this situation be handled? I couldn't find any information about this case.

  • 2
    I don't understand what you mean by assuming that a run would be shorter than one byte, but whatever it means, it would depend on your implementation of RLE. RLE is a type of compression with many ways to implement it. – glenebob May 29 '18 at 00:08
  • Yeah this is not very clear. I mean, you ask a question, then in the case of no you ask a completely different unrelated question, please be more specific – TheGeneral May 29 '18 at 00:09
  • 2
    Much more clear after the edit, and as I predicted, it totally depends on how you choose to implement it. However, if you limit the length value to one byte, then of course a run can only be the number of distinct lengths that can be described by a byte. You could also use some form of variable-length length encoding to enable longer runs. In any case, a run longer than the capability of your length encoding will need to be split, – glenebob May 29 '18 at 00:15
  • @glenebob you should write this up as an answer! – Sam May 29 '18 at 00:25
  • It depends... the most classic code/token based RLE algorithms I've seen actually uses that single byte for both indicating whether it's a 'copy following data' or a 'repeat single value' command, and to indicate a length, meaning you got a 1-bit command + 7 bits of length in there, meaning the maximum run length to store in one command is only 127 then. – Nyerguds Jun 13 '18 at 10:22

1 Answers1

1

To my understanding, you understand the situation correctly. The word length used for counting the repetition of a character is usually a byte, and the individual characters usually are also encoded as a byte. If in the input there is a repetition of e.g. 300 b, the encoding will be as follows.

255 (number of repetitions of the next character)
 98 (ASCII value for b)
 45 (nunber of repetitions of the next character)
 98 (ASCII value for b)

In total, a run of length larger than 255 will have to be split in two runs. That being said, the actual encoding depends on the specific implementations; it is also possible to use other types than bytes for counting the repetition of characters.

Codor
  • 17,447
  • 9
  • 29
  • 56
  • Unfortunately, naïve pair-RLE like that will quickly run into [the "abracadabra" problem](http://www.shikadi.net/moddingwiki/RLE_Compression); uncompressable data will immediately double in size. – Nyerguds Jun 30 '18 at 20:46