Could you give me some explain of LZ77 Algorithm?

Question

I'm trying to learn LZ77 algorithm with my friend, and some case give us a confusion.

for example)

init

search buffer size: 7
look-ahead buffer size: 8
original string: abcabbcabbcabca

current window: abcabbc view: abbcabca

What I thought the LLD tuple is : Literal: 'a' Length: 4 Distance: 4

What my friend thought the LLD tuple is : Literal: 'c' Length: 6 Distance: 4

I thought the algorithm to find longest match string only checks in search buffer, but he says there is no limit to find match string.

Who's answer is correct?

My Answer didn't answer your question, so I deleted it. You might be able to see what I wrote [here](https://stackoverflow.com/a/71709604/589924). I edited it so that it highlights the problem. — ikegami, Apr 01 '22 at 16:06

ikegami · Accepted Answer · 2022-04-01T17:06:57.103

I can't give you a definitive answer, but I can show there's no reason LZ77 couldn't support "overly long" tuples.

I believe you're asking if the length component of a tuple can be larger than its distance component.

In other words, I believe you're asking which of the following streams of tuples would be produced:

( Length: 0, Distance: 0, Char: 'a' )
(0,0,'b')
(0,0,'c')
(2,3,'b')
(4,4,'c')^[1] Length capped to distance.
(2,4,'c')
(0,0,'a')

or

(0,0,'a')
(0,0,'b')
(0,0,'c')
(2,3,'b')
(7,4,'c')^[2] Length larger than distance.
(0,0,'a')

Obviously, the latter would produce better compression.

I can't give you a definitive answer, but I can show that the decoder can handle "overly long" tuples just as easily as the encoder can produce them. As such, there's no reason LZ77 couldn't support them.

Reconstructing the input from the first stream

(0,0,'a'): "" + "a" ⇒ "a"
(0,0,'b'): "" + "b" ⇒ "ab"
(0,0,'c'): "" + "c" ⇒ "abc"
(2,3,'b'): "ab" + "b" ⇒ "abcabb"
(4,4,'c'): "cabb" + "c" ⇒ "abcabbcabbc"
(2,4,'c'): "ab" + "c" ⇒ "abcabbcabbcabc"
(0,0,'a'): "" + "a" ⇒ "abcabbcabbcabca"

Simple.

Reconstructing the input from the second stream

(0,0,'a'): "" + "a" ⇒ "a"
(0,0,'b'): "" + "b" ⇒ "ab"
(0,0,'c'): "" + "c" ⇒ "abc"
(2,3,'b'): "ab" + "b" ⇒ "abcabb"

(7,4,'c')

So far, we've produced abcabb. The tuple says the next 7 characters start 4 characters from the end of abcabb.

a b c c a b b _ _ _ _ _ _ _ c
      | | | | ^ ^ ^ ^ ^ ^ ^
      | | | | | | | | | | |
      +-)-)-)-+ | | | ? ? ?
        +-)-)---+ | |
          +-)-----+ |
            +-------+

We're missing three characters! Are are we? If it's all one buffer and we copy from left to right, we just have to keep going.

                  +-------+
                +-)-----+ |
              +-)-)---+ | |
              | | |   | | |
              | | |   v v v
a b c c a b b _ _ _ _ _ _ _ c
      | | | | ^ ^ ^ ^
      | | | | | | | |
      +-)-)-)-+ | | |
        +-)-)---+ | |
          +-)-----+ |
            +-------+

So this works with no extra effort!

(7,4,'c'): "cabbcab" + "c" ⇒ "abcabbcabbcabc"

(0,0,'a'): "" + "a" ⇒ "abcabbcabbcabca"

You incorrectly said (4,4,'a').
You incorrectly said (6,4,'c').

Thanks! We decided to use latter one and it works well! – sqix Apr 16 '22 at 16:50 — sqix, Apr 16 '22 at 16:50

score 0 · Answer 2 · answered Apr 01 '22 at 23:24

0

Your friend is right. A simpler example is that LZ77 accomplishes run-length coding with a distance 1 match, with length n-1 for a run of n copies of the same byte. (The first occurrence of the byte is a literal.)

answered Apr 01 '22 at 23:24

Mark Adler

101,978
13
118
158

Could you give me some explain of LZ77 Algorithm?

2 Answers2