0

I need to get these options for LZMA in Python:

def lzma_compression(input):
    return lzma.compress(
        input,
        format=lzma.FORMAT_RAW,
        filters=[
            {
                'id': lzma.FILTER_LZMA1,
                'lc': 3,
                'lp': 0,
                'pb': 2,
                'dict_size': 128 * 1024,
            }
        ],
    )

Into C#.

So far I have got this:

        static int dictionary = 128 * 1024;
        static bool eos = false;

        static CoderPropID[] propIDs =
                {
                    CoderPropID.DictionarySize,
                    CoderPropID.PosStateBits,
                    CoderPropID.LitContextBits,
                    CoderPropID.LitPosBits,
                    CoderPropID.Algorithm,
                    CoderPropID.NumFastBytes,
                    CoderPropID.MatchFinder,
                    CoderPropID.EndMarker
                };

        // these are the default properties:
        static object[] properties =
                {
                    (System.Int32)(dictionary),
                    (System.Int32)(2),
                    (System.Int32)(3),
                    (System.Int32)(1),
                    (System.Int32)(2),
                    (System.Int32)(128),
                    "bt4",
                    eos
                };

        public static byte[] Compress(byte[] inputBytes)
        {
            byte[] retVal = null;
            SevenZip.Compression.LZMA.Encoder encoder = new SevenZip.Compression.LZMA.Encoder();
            encoder.SetCoderProperties(propIDs, properties);

            using (System.IO.MemoryStream strmInStream = new System.IO.MemoryStream(inputBytes))
            {
                using (System.IO.MemoryStream strmOutStream = new System.IO.MemoryStream())
                {
                    encoder.WriteCoderProperties(strmOutStream);
                    long fileSize = strmInStream.Length;
                    for (int i = 0; i < 8; i++)
                        strmOutStream.WriteByte((byte)(fileSize >> (8 * i)));

                    encoder.Code(strmInStream, strmOutStream, -1, -1, null);
                    retVal = strmOutStream.ToArray();
                } // End Using outStream

            } // End Using inStream 

            return retVal;
        } // End Function Compress

However if I compress the same input in both languages I get different output bytes:

Python:

output = lzma_compression(b'x83') # b'\x00<\x0e\x02p\x7f\xff\xff\xff\xf8\x00\x00\x00' (13 bytes)

C#

bytes[] input = new bytes[1];
input[0] = 131;
bytes[] output = Compress(input); // output =  \x66x00\x00\x02\x00\x01\x00\x00\x00x00\x00\x00x00\x00\x41\x7f\xfc\x00\x00 (19 bytes)

I am using the 7Zip LZMA SDK NuGet package for C#. I think it is because there are some properties that are set differently. What properties for the LZMA Compressor should I change in C# to get the same output as in Python?

popcorn
  • 388
  • 1
  • 7
  • 28
  • Why is getting the same output relevant? Does the data decompress correctly? – JonasH Jul 21 '21 at 19:35
  • Because I then work with the compressed data further and I basically don't care about the non-compressed data anymore. So it is important to have the same result in the end. Which won't be if this compressed data will be different. – popcorn Jul 21 '21 at 19:39
  • There is no guarantee that different implementations of an algorithm produces identical result. Or even different versions of the same implementation for that matter. The only requirement is that the produced data follows the compression format. If you do not intend to decompress the data you might want to do something like hashing or encryption instead. – JonasH Jul 21 '21 at 19:43
  • I mean there should be some standards to LZMA algorithm so if I provide the same compression options and input I should get the same result no matter the language or am I wrong? – popcorn Jul 21 '21 at 19:57
  • The thing is that I'm implementing own PayBySquare generator and the LZMA compression is used there. – popcorn Jul 21 '21 at 20:17
  • This is somewhat of a simplification, but typically compression formats only define how the data is structured and how to decompress it. This allows implementations to improve with time while allowing backward and forward compatibility. It would be good if you could describe what it is you are actually trying to do, see the X/Y problem. – JonasH Jul 21 '21 at 20:19
  • I just did in the comment above ^. – popcorn Jul 21 '21 at 20:25
  • Saying "I want to use LZMA to generate PayBySquare" tells us nearly nothing. You need explain what the overall process is, and specifically why you expect and need a consistent bitstream from a compression algorithm. – JonasH Jul 21 '21 at 20:31

0 Answers0