1

I have some base-64 encoded encrypted data and noticed a fair amount of repetition. In a (approx) 200-character-long string, a certain base-64 character is repeated up to 7 times in several separate repeated runs.

Is this a red flag that there is a problem in the encryption? According to my understanding, encrypted data should never show significant repetition, even if the plaintext is entirely uniform (i.e. even if I encrypt 2 GB of nothing but the letter A, there should be no significant repetition in the encrypted version).

John
  • 15,990
  • 10
  • 70
  • 110
JoelFan
  • 37,465
  • 35
  • 132
  • 205
  • 1
    Depends entirely on the algorithm. Do you know what is being used? – leebriggs Feb 04 '11 at 19:07
  • @leeeb, I am submitting that it is a red flag that the algorithm is bad – JoelFan Feb 04 '11 at 19:11
  • 1
    What block mode is being used (or is it a stream cipher)? If you're using ECB and the input is repetitive then this is inevitable and the solution is to change mode to e.g. CBC. – Peter Taylor Feb 04 '11 at 19:15
  • @Peter, let's say, for the purpose of the question, that I have no information other than the encrypted data. Can I infer from the repetition alone that the encryption algorithm is faulty? – JoelFan Feb 04 '11 at 19:26
  • @SplashHit, that depends on what you mean by "the algorithm". The best block cipher in the world can be used in ways which leak information. See http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation#Electronic_codebook_.28ECB.29 for a more detailed explanation. – Peter Taylor Feb 04 '11 at 19:38
  • @Peter, By algorithm, I am including the concept of "mode" – JoelFan Feb 04 '11 at 20:27
  • I'd say: probably a block cipher in ECB mode. Clue would be: are the repetitions occurring multiples of 8 or 16 bytes apart (bytes, not base64-characters)? If so, this is a definite hint this is the case. – Henno Brandsma Feb 06 '11 at 09:52

3 Answers3

6

According to the binomial distribution, there is about a 2.5% chance that you'd see one character from a set of 64 appear seven times in a series of 200 random characters. That's a small chance, but not negligible. With more information, you might raise your confidence from 97.5% to something very close to 100% … or find that the cipher text really is uniformly distributed.

You say that the "character is repeated up to 7 times" in several separate repeated runs. That's not enough information to say whether the cipher text has a bias. Instead, tell us the total number of times the character appeared, and the total number of cipher text characters. For example, "it appeared a total of 3125 times in 1000 runs of 200 characters each."

Also, you need to be sure that you are talking about the raw output of a cipher. Cipher text is often encapsulated in an "envelope" like that defined by the Cryptographic Message Syntax. Of course, this enclosing structure will have predictable patterns.

erickson
  • 265,237
  • 58
  • 395
  • 493
0

Well I guess it depends. Repetition in general is bad thing if it represents the same data.

Considering you are encoding it have you looked at data to see if you have something that repeats in those counts?

In order to understand better you gotta know what kind of encryption does it use. It could be just coincidence that they are repeating.

But if repetition comes from same data, then it can be a red flag because then frequency counts can be used to decode it.

What kind of encryption are you using? Home made or some industry standard?

grobartn
  • 3,510
  • 11
  • 40
  • 52
0

It depends on how are you encrypting your data.

Base64 encoding a string may count as light obfuscation, but it is NOT encryption. The purpose of Base64 encoding is to allow any sort of binary data to be encoded as a safe ASCII string.

Adam Batkin
  • 51,711
  • 9
  • 123
  • 115
  • The data is Base64-encoded, encrypted data... meaning it was first encrypted, then the encrypted data was Base-64 encoded – JoelFan Feb 04 '11 at 19:25