How to improve random number generation in kubernetes cluster containers?

Question

I'm seeing some issues with random number generation inside containers running in a kubernetes cluster (repeated values). It might be the lack of entropy inside the container, or it could be something else, on a higher level, but I'd like to investigate the entropy angle and I have a few questions I'm having trouble finding the answers to.

The value of /proc/sys/kernel/random/entropy_avail is between 950 and 1050 across containers and nodes - is that good enough? rngtest -c 10000 </dev/urandom returns pretty good results - FIPS 140-2 successes: 9987, FIPS 140-2 failures: 13, but run against /dev/random it just hangs forever.
The entropy_avail values in containers seem to follow the values on the nodes. If I execute cat /dev/random >/dev/null on the node, entropy_avail drops also inside the containers running on that node, even though docker inspect doesn't indicate that the /dev/*random devices are bind-mounted from the node. So how do they relate? Can one container consume the entropy available to other containers on that node?
If entropy_avail around 1000 is something to be concerned about, what's the best way of increasing that value? It seems deploying a haveged daemonset would be one way (https://github.com/kubernetes/kubernetes/issues/60751). Is that the best/simplest way to go about it?

I'm having trouble finding the answers on google, stackoverflow, and in kubernetes github issues. I also got no response in the kubernetes-users slack channel, so I'm hoping someone here can shed some light on this.

Proper pseudo-random number generation underpins all cryptographic operations, so any kubernetes user should be interested in the answers.

Thanks in advance.

In general, using `/dev/urandom` (which uses a cryptographic PRNG) is preferred over `/dev/random` (which should be used, at most, only to generate a seed for a cryptographic PRNG); in general, concerns over whether `/dev/urandom` is as secure as `/dev/random` are misguided. See also "[Myths about /dev/urandom](https://www.2uo.de/myths-about-urandom)". — Peter O., May 10 '19 at 13:23
The number is supposed to be an estimate of the number of bits of entropy. Anything 256 or greater should be adequate for all currently known cryptographic schemes. If you are just concerned about avoiding duplicates, you would be unlikely to see a duplicate for values 128 or greater, but you would not be too surprised to see one for values 64 or less. In short, if entropy_avail is around 1000 then entropy will never be your problem unless that estimation is way, way off. — President James K. Polk, May 10 '19 at 13:44
However, if different node instances are somehow largely just copies of each other *including the /dev/random state* then that is a problem. — President James K. Polk, May 10 '19 at 13:49

score 0 · Answer 1 · answered Mar 13 '23 at 17:59

There are two types of "randomness" in computers:

pseudo-random numbers which are algorithmic and can be 100% reproduced if you know the algorithm and starting value and
true random numbers, where no algorithm exists and that can never be reproduced.

Both have there merits. Often you want to be able to reproduce random numbers, e.g. in data science and science in general. However, for cryptography reproducibility is never a good thing...

For this reason there is /dev/random and /dev/urandom which are interfaces provided by the kernel that derive true randomness from real hardware and user interruptions and actions -- even noise on driver/hardware level etc. While /dev/random is a blocking interface, meaning it really waits until entropy was collected and can be returned (this can be veeeeery slow, basically unusable for most applications!), /dev/urandom is non-blocking and always returns "a best effort", compromising quality for speed (but in a smart way, read on).

This answers parts of your question: YES the entropy of all jobs running on one physical hardware is strongly correlated! It has the same source, or is even identical (in detail probably depending on OS setup). This is not a problem since the actual random numbers will be completely uncorrelated, if the entropy is sufficient.

This brings us to another aspect of your question: on server machines with hundreds of threads running all the time handling thousands of user requests per second the entropy per time will always be very high. 1000 entropy bits available is not so little. It is enough, if used properly, see below (but I am not quite sure at what level one should become concerned!). And remember: the entropy bit-pool fills all the time continuously.

Another detail on /dev/urandom: in modern linux, this is actually pseudo-random number generated. Thus, it generates numbers via an algorithm, very fast but reproducible. The way /dev/urandom is made usable for cryptography is that it is periodically re-seeded in a random way from /dev/random. This makes it perfectly suited for all cryptographic needs.

Thus, you should not draw from /dev/random. While the quality of randomness will be exceptional, the costs of generating the entropy is too high. You should use /dev/urandom and will always be performant. If you want to be careful, make sure the entropy_avail does not drop (not sure what is a reasonable threshold! Not sure if this is known in general.)

How to improve random number generation in kubernetes cluster containers?

1 Answers1

Linked