I can't give you a definitive answer, but I can show there's no reason LZ77 couldn't support "overly long" tuples.
I believe you're asking if the length component of a tuple can be larger than its distance component.
In other words, I believe you're asking which of the following streams of tuples would be produced:
- ( Length: 0, Distance: 0, Char: 'a' )
- (0,0,'b')
- (0,0,'c')
- (2,3,'b')
- (4,4,'c')[1] Length capped to distance.
- (2,4,'c')
- (0,0,'a')
or
- (0,0,'a')
- (0,0,'b')
- (0,0,'c')
- (2,3,'b')
- (7,4,'c')[2] Length larger than distance.
- (0,0,'a')
Obviously, the latter would produce better compression.
I can't give you a definitive answer, but I can show that the decoder can handle "overly long" tuples just as easily as the encoder can produce them. As such, there's no reason LZ77 couldn't support them.
Reconstructing the input from the first stream
- (0,0,'a'): "" + "a" ⇒ "a"
- (0,0,'b'): "" + "b" ⇒ "ab"
- (0,0,'c'): "" + "c" ⇒ "abc"
- (2,3,'b'): "ab" + "b" ⇒ "abcabb"
- (4,4,'c'): "cabb" + "c" ⇒ "abcabbcabbc"
- (2,4,'c'): "ab" + "c" ⇒ "abcabbcabbcabc"
- (0,0,'a'): "" + "a" ⇒ "abcabbcabbcabca"
Simple.
Reconstructing the input from the second stream
(0,0,'a'): "" + "a" ⇒ "a"
(0,0,'b'): "" + "b" ⇒ "ab"
(0,0,'c'): "" + "c" ⇒ "abc"
(2,3,'b'): "ab" + "b" ⇒ "abcabb"
(7,4,'c')
So far, we've produced abcabb
. The tuple says the next 7 characters start 4 characters from the end of abcabb
.
a b c c a b b _ _ _ _ _ _ _ c
| | | | ^ ^ ^ ^ ^ ^ ^
| | | | | | | | | | |
+-)-)-)-+ | | | ? ? ?
+-)-)---+ | |
+-)-----+ |
+-------+
We're missing three characters! Are are we? If it's all one buffer and we copy from left to right, we just have to keep going.
+-------+
+-)-----+ |
+-)-)---+ | |
| | | | | |
| | | v v v
a b c c a b b _ _ _ _ _ _ _ c
| | | | ^ ^ ^ ^
| | | | | | | |
+-)-)-)-+ | | |
+-)-)---+ | |
+-)-----+ |
+-------+
So this works with no extra effort!
(7,4,'c'): "cabbcab" + "c" ⇒ "abcabbcabbcabc"
(0,0,'a'): "" + "a" ⇒ "abcabbcabbcabca"
- You incorrectly said (4,4,'a').
- You incorrectly said (6,4,'c').