1

An Android app that I am writing acquires data compressed using LZString and sent out as base 64. I am using this implementation for LZString in Java along with this one in PHP. Both of these implementations are the top recomendations listed here by the author of the original JavaScript port of LZW.

I have had a really tough time understanding why the LZString data sent out by PHP throw up exceptions in Java. After much experiment and frayed nerves I have eventually worked out that the issue is down to apparent padding that is expected in Java and is missing in the data sent out from PHP. Take the following as examples

Original String being compressed

Betty bought a bit of butter but it was bitter so she bought some better butter to make the bitter butter better

a sentence I use for testing since with it multiple repetitions it is likely to compress well.

The PHP implementation of LZString spits out the following byte array

69 73 85 119 76 109 67 101 65 69 66 71 68 50 66 88 65 53 103 67 122 78 65 
104 110 65 108 104 43 65 90 110 73 104 67 65 69 55 69 90 55 81 68 117 109 65 
122 114 113 82 102 102 78 80 97 105 72 69 109 104 113 119 76 90 100 89 52 77 
79 85 113 105 75 89 78 118 48 119 66 114 76 109 69 53 77 74 52 115 99 79 90 
65

while the Java implementation generates the following byte array

69 73 85 119 76 109 67 101 65 69 66 71 68 50 66 88 65 53 103 67 122 78 65 
104 110 65 108 104 43 65 90 110 73 104 67 65 69 55 69 90 55 81 68 117 109 65 
122 114 113 82 102 102 78 80 97 105 72 69 109 104 113 119 76 90 100 89 52 77 
79 85 113 105 75 89 78 118 48 119 66 114 76 109 69 53 77 74 52 115 99 79 90 
65 **65 65 61 61**

You will note that the Java implementation tags on extra **AA==**.

I can at a pinch understand why there is an == - padding to get to the desired length multiple. However, I cannot understand why or where the AA are coming from.

I tested LZString.decompressFromBase64 in Java after tagging on an additional AA== and found that it works. On the other hand simply tagging on an == threw an exception. Further experiment revealed that tagging on ==== worked and so too did BB== indicating that these four bytes are simply used for padding and not put to any other use.

At this point I could quite simply append padding as appropriate in Java prior to doing LZString.decompressFromBase64. However, that I fear that would be a "solution" implemented without a full understanding of what is happening here. Perhaps someone here can shed some light?

DroidOS
  • 8,530
  • 16
  • 99
  • 171
  • The output you're comparing is the base64 encoded output. What do the raw bytes look like? Are they also different? I'm not familiar with Java, but it seems odd they've implemented their own base64 encoding instead of using a built-in or standard library. Showing the code you used to generate the data would also be helpful. – miken32 May 10 '19 at 19:29

0 Answers0