PBKDF2 Python keys vs .NET Rfc2898

Question

I am trying to write a Python module that will encrypt text that our existing .NET classes can decrypt. As far as I can tell, my code lines, up but it isn't decrypting (I get an 'Invalid padding length' error on the C# side). My pkcs7 code looks good, but research indicates that invalid keys could cause this same problem.

What's different between these two setups? Python:

derived_key = PBKDF2(crm_key, salt, 256 / 8, iterations)
iv = PBKDF2(crm_key, salt, 128 / 8, iterations)

encoder = pkcs7.PKCS7Encoder()

cipher = AES.new(derived_key, AES.MODE_CBC, iv)
decoded = cipher.decrypt(encoded_secret)

#encode - just stepped so i could debug. 
padded_secret = encoder.encode(secret)              # 1
encodedtext = cipher.encrypt(padded_secret)         # 2
based_secret = base64.b64encode(encodedtext)        # 3

I thought that based_secret could get passed up to C# and decoded there. But it fails. The same encrypting c# code is:

var rfc = new Rfc2898DeriveBytes(key, saltBytes);


        // create provider & encryptor
        using (var cryptoProvider = new AesManaged())
        {
            // Set cryptoProvider parameters
            cryptoProvider.BlockSize = cryptoProvider.LegalBlockSizes[0].MaxSize;
            cryptoProvider.KeySize = cryptoProvider.LegalKeySizes[0].MaxSize;

            cryptoProvider.Key = rfc.GetBytes(cryptoProvider.KeySize / 8);
            cryptoProvider.IV = rfc.GetBytes(cryptoProvider.BlockSize / 8);

            using (var encryptor = cryptoProvider.CreateEncryptor())
            {
                // Create a MemoryStream.
                using (var memoryStream = new MemoryStream())
                {
                    // Create a CryptoStream using the MemoryStream and the encryptor.
                    using (var cryptoStream = new CryptoStream(memoryStream, encryptor, CryptoStreamMode.Write))
                    {
                        // Convert the passed string to a byte array.
                        var valueBytes = Encoding.UTF8.GetBytes(plainValue);

                        // Write the byte array to the crypto stream and flush it.
                        cryptoStream.Write(valueBytes, 0, valueBytes.Length);
                        cryptoStream.FlushFinalBlock();

                        // Get an array of bytes from the
                        // MemoryStream that holds the
                        // encrypted data.
                        var encryptBytes = memoryStream.ToArray();

                        // Close the streams.
                        cryptoStream.Close();
                        memoryStream.Close();

                        // Return the encrypted buffer.
                        return Convert.ToBase64String(encryptBytes);
                    }
                }
            }

The Python pkcs7 implementation I'm using is: https://gist.github.com/chrix2/4171336

The first thing I'd try is to check whether or not the generated key is the same in C# and Python. — NullUserException, Oct 02 '14 at 22:12
They key output from C# is a byte array and its a UTF-8 string in Python. To compare these, properly, can I Convert.ToBase64(csharpKey) and base64.encode(pythonkey) ? Should that get me the comparable items? — shelbydz, Oct 03 '14 at 11:34
After doing a ton of research I found a few things that helped: The initial instance of Rfc2898DeriveBytes.GetBytes and the call to PBKDF2 produce the same key. However, according to MSDN, GetBytes is compounding the calls to generate a new key. The second time I call GetBytes (to get the IV), the output it completely different. — shelbydz, Oct 03 '14 at 17:13
Ouch, can't believe I missed that one, did not read further than the comment of NullUserException. Yes, if you perform `GetBytes` twice you will simply get more bytes of the *stream* that PBKDF2 generates. And if you do it on `PasswordDeriveBytes` you get complete trap, but that's another issue altogether. Better post it as an answer to your own question, glad you got it solved. — Maarten Bodewes, Oct 05 '14 at 14:29

shelbydz · Accepted Answer · 2014-10-07T15:28:28.283

First off, I verified that Rfc2898 and PBKDF2 are the same thing. Then, as stated above, the problem appears to be a .net ism. I found on msdn

that the implementation of GetBytes inside of Rfc2898DeriveBytes changes on each call, ie. it holds state. (see the remarks about halfway down the page)

Example in Python (pseudo output):

derived_key = PBKDF2(key, salt, 32, 1000)
iv = PBKDF2(key, salt, 16, 1000)
print(base64.b64encode(derived_key))
print(base64.b64encode(iv))
$123456789101112134==
$12345678==

Same(ish) code in .NET (again, pseudo output):

var rfc = new Rfc2898DeriveBytes(key, saltBytes);
    using (var cryptoProvider = new AesManaged())
    {
        // Set cryptoProvider parameters
        cryptoProvider.BlockSize = cryptoProvider.LegalBlockSizes[0].MaxSize;
        cryptoProvider.KeySize = cryptoProvider.LegalKeySizes[0].MaxSize;

        cryptoProvider.Key = rfc.GetBytes(cryptoProvider.KeySize / 8);
        cryptoProvider.IV = rfc.GetBytes(cryptoProvider.BlockSize / 8);
    }
Console.Writeline(Convert.ToBase64(cryptoProvider.Key));
Console.Writeline(Convert.ToBase64(cryptoProvider.IV));

$123456789101112134==
$600200300==

Subsequent calls to rfc.GetBytes always produces different results. MSDN says it compounds the key sizes on the calls. So if you call GetBytes(20), twice, it's the same as calling GetBytes(20+20) or GetBytes(40). Theoretically, this should just increase the size of the key, not completely change it.

There are some solutions to get around this issue, which could be generating a longer key on the first call, then slicing it into both a derived key AND an IV, or randomly generating an IV, appending it to the encoded message and peeling it off before decrypting it.

Slicing the python output produces the same results as .NET. It looks like this:

derived_key = PBKDF2(key, salt, 32, 1000)
iv = PBKDF2(key, salt, 32 + 16, 1000) # We need 16, but we're compensating for .NETs 'already called' awesomeness on the GetBytes method
split_key = iv[32:]

print(base64.b64encode(derived_key))
print(base64.b64encode(iv))
print(base64.b64encode(split_key))

$ 123456789101112134==   # matches our derived key
$ 12345678== # doesn't match
$ 600200300== # matches. this is the base 64 encoded version of the tailing 16 bytes.

Enjoy,

PBKDF2 Python keys vs .NET Rfc2898

1 Answers1