2

I'm implementing a simple AES-128 encryption program using Java's Crypto library.

Unfortunately, this doesn't work very well all the time. Sometimes it happens that the double-quotes (") in the plaintext String are when encrypted and then decrypted, show up as � in the decrypted string.

From what I found out, this character is &#xfffd (&#65533), which is a replacement character used by the UTF-8 decoder when it encounters an error.

Thus, my question is why does this error occur, why only sometimes and how do I resolve it?

Here's my code block for encryption and decryption:

public static String encrypt(String value)
    {
        try {
            IvParameterSpec iv = new IvParameterSpec(initVector.getBytes("UTF-8"));
            SecretKeySpec skeySpec = new SecretKeySpec(key.getBytes("UTF-8"), "AES");

            Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5PADDING");
            cipher.init(Cipher.ENCRYPT_MODE, skeySpec, iv);

            byte[] encrypted = cipher.doFinal(value.getBytes());
            return Base64.getEncoder().encodeToString(encrypted);
        } catch (Exception ex) {
            ex.printStackTrace();
        }
        return null;
    }

    public static String decrypt(String encrypted)
    {
        try {
            IvParameterSpec iv = new IvParameterSpec(initVector.getBytes("UTF-8"));
            SecretKeySpec skeySpec = new SecretKeySpec(key.getBytes("UTF-8"), "AES");

            Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5PADDING");
            cipher.init(Cipher.DECRYPT_MODE, skeySpec, iv);
            byte[] original = cipher.doFinal(Base64.getDecoder().decode(encrypted));

            return new String(original);
        } catch (Exception ex) {
            ex.printStackTrace();
        }

        return null;
    }

Here are some examples cases:

Case 1: Error caused

Enter Plaintext: had no pictures or conversations in it, “and what is the use of a book,” thought Alice “without pictures or conversations?”
Encrypted String: UgapiW5aYIKkoKkyeHvvFlgf8mCIq1AopmCtYGiJ23eNILNn1OXtM4enEvB5Kt1imNmynyHCCjLbbjB7TV0sq2F3Iz+YUehOw50gje6IMj8fpaEracq1NvZQXSH5T8fyBtAUH3weU5FE5vr3dYmwTSGUxNR2gaRV6MV7vmcEuIz7A5MLnKjsb2+1Sya+l/k2
Enter Ciphertext: UgapiW5aYIKkoKkyeHvvFlgf8mCIq1AopmCtYGiJ23eNILNn1OXtM4enEvB5Kt1imNmynyHCCjLbbjB7TV0sq2F3Iz+YUehOw50gje6IMj8fpaEracq1NvZQXSH5T8fyBtAUH3weU5FE5vr3dYmwTSGUxNR2gaRV6MV7vmcEuIz7A5MLnKjsb2+1Sya+l/k2
After decryption: had no pictures or conversations in it, �and what is the use of a book,� thought Alice �without pictures or conversations?�

Case 2: No issue

Enter Plaintext: Hello there, "Camera-man". He's sitting now. 
Encrypted String: jb2QJ5nLQCjGKw6l2q9GnX6jgTJVGWn6LiVRfE5oRT7WT7vYNejKPHIhgorbfaob
Enter Ciphertext: jb2QJ5nLQCjGKw6l2q9GnX6jgTJVGWn6LiVRfE5oRT7WT7vYNejKPHIhgorbfaob
After decryption: Hello there, "Camera-man". He's sitting now. 
  • 1
    I'm confident that the crypto stuff isn't relevant here; rather, the problem must be in the round trip of `value.getBytes()` and `new String(original)`. And the difference between your "Case 1: Error caused" and your "Case 2: No issue" is that Case 1 involves non-ASCII curly quotes. As a first step toward understanding the issue better, I recommend looping over `value.toCharArray()` and printing the characters as integers (preferably in hex), and then doing the same for `value.getBytes()`. – ruakh Oct 08 '20 at 05:01
  • 1
    You're absolutely right! The two double quotes are different. The one in Case 1 has an UNICODE value of 8220 while the one in Case 2 is an ASCII 34. I had copied the curly quotes from a different source and that's what caused the issue. Now my question I suppose, will be what can I do to account for this? Is there any way to automatically change the non-ASCII double quotes to the ASCII double quotes and do the same for other such characters? – Maurice Kasomwung Oct 08 '20 at 05:13

2 Answers2

2

I'm confident that the crypto stuff isn't relevant here; rather, the problem must be in the round trip of String.getBytes() and new String(byte[]). Your "Case 1: Error caused" involves non-ASCII curly quotes (whereas your "Case 2: No issue" uses regular ASCII quotes), so apparently String.getBytes() and new String(byte[]) on your system don't handle that character very well. (Those methods are documented as using "the platform's default charset", and apparently your platform's default charset is one that doesn't support that character.)

To fix this, I think all you need to do is switch from String.getBytes() to String.getBytes(Charset) and from new String(byte[]) to new String(byte[], Charset), using StandardCharsets.UTF_8 as the charset in both cases. (Or any other appropriate charset, but UTF-8 is the most common choice nowadays.)

So:

            byte[] encrypted = cipher.doFinal(
                value.getBytes(StandardCharsets.UTF_8));

and

            return new String(original, StandardCharsets.UTF_8);
ruakh
  • 175,680
  • 26
  • 273
  • 307
2

The quote in your paste is not the standard ASCII ", but something else.

You have string-based keys and ivs and use UTF-8 to turn those into actual byte arrays. This is probably a mistake, you're reducing randomness a bit there, but not enough to be too worried about.

However, for the actual payload, you don't do this - and that's where you should have.

It's not value.getBytes(), it's value.getBytes(StandardCharsets.UTF-8), and not new String(original), but new String(original, StandardCharsets.UTF_8).

rzwitserloot
  • 85,357
  • 5
  • 51
  • 72