5

I came across this method in Pycrypto, which is used to generate random bytes:

from Crypto import Random
Random.get_random_bytes(5)

I was wondering how this method is different from a simple generator like the following:

import random
def get_random_bytes(N):
    ASCII = "".join(chr(x) for x in range(255))
    return "".join(random.choice(ASCII) for _ in range(n))

Note: my intuition is that the Pycrypto method is more cryptographically "sound". Looking at random's documentation, it says that it is based on a generator with a period of 2**19937-1. Looking at Random.get_random_bytes, it states that it is capable of generating cryptographically-strong bytes. What does that mean?

Of course, I wish to use the library implementation, instead of my own. I just want to understand the cryptography concepts behind it.

verybadalloc
  • 5,768
  • 2
  • 33
  • 49

1 Answers1

5

For a cryptographically secure random number generator any sequence of output provides you no information as for what the next output will be.

random is based on the Mersenne Twister. It has an internal state of 624 32-bit numbers. Given the output of 1248 values you know the entire state at some point. From that you can with 100% accuracy determine what all future outputs will be.

user515430
  • 3,341
  • 2
  • 17
  • 13
  • 1
    Exactly. To put it another way, while `random` might have a period of `2**19937-1`, that simply guarantees that it won't repeat the same sequence during that period. It's completely possible to figure out where on that period you currently are given a relatively (to the period) small number of samples and subsequently track it. – aruisdante Mar 14 '14 at 03:36
  • I see. I guess this would be a different question, but how does the Pycrypto's Random manages to overcome `random` problems? Isn't it also a pseudorandom algorithm, that, eventually, will have the same limitation? – verybadalloc Mar 14 '14 at 06:52
  • A cryptographically secure random number generator typically maintains different pools that are filled with entropy from things like network activities, keystroke timings, mouse clicks timings, hard drive timings etc. These pools are mixed and hashed at different intervals to the internal state. The output is generated from the internal state through another hashing. Hence you cannot recreate the internal state. Furthermore the changes to the internal state is not regular (no pattern) but depends on outer sources. – user515430 Mar 14 '14 at 07:27
  • user515430, Your link does not state what you say it does. It does not disscus entropy – NDEthos Dec 04 '14 at 22:27