Vernon Murdoch, Chief Architect at Haventec, explains the power of random numbers for security and fun
Random numbers generated by software shouldn’t make sense. Whether they’re used in cryptography or to add extra fun to a game, they’re meant to be unsystematic, unpredictable.
Programmers traditionally rely on external sources to generate seeds and salting to make it more difficult to predict the next number in the random sequence.
These unpredictable sources are known as ‘entropy inputs’. As more random data is required these entropy sources become depleted. This then means programmers have to wait until the entropy sources are replenished so they can get the seeds they need.
System entropy is actually a scarce resource on most computers. So to ensure the numbers are less predictable, we use cryptography to generate ‘secure’ random numbers.
If the seed is known, we have problems, because the supposedly random numbers become easier to guess if someone captures the output and analyses it.
On a non-compromised system the secure random stream is much more difficult to guess by just capturing the output.
There are a few major problems with random numbers being generated by computers:
(1) Reliance on system entropy to seed and salt the number generation. Checking for randomness is difficult and there is no way to know if the seed has been compromised (the numbers have become guessable).
(2) System entropy is always going to be a problem if it is just software based. Using hardware number generators can help. Cryptographers have created a secure number generator – Fortuna – which minimises the reliance on system entropy using seed pools and cryptography. Fortuna also has recoverable algorithms built into the randomness generation that stops compromised seeds from polluting the stream and making all future numbers guessable.
(3) Testing a number generator’s output stream for randomness isn’t easy. Ideally you want to check that the random stream is still random and hasn’t become highjacked or entered a state where it is repeating sequences. The industry standard tool is “Dieharder” and is supported in the package management system of all Unix operating systems. This tool gives an indication of how random the data looks but provides no guarantees. The other problem with this tool is it takes a lot of CPU power to run all the tests needed to confirm that a sample of the stream is random. “Dieharder” is used to test random number generators when systems and software is being built but is never used during production random number generation, for obvious reasons.
In comparison a very quick and simple way to look for repeated sequences would be to use a compression tool, like gzip, to check the stream’s compression ratio.
A high compression ratio would indicate the stream is not random. A low compression ratio does not guarantee a random stream but it does give an indication the stream is still random. This requires much less compute time than running a full sequence of “Dieharder” tests.
Using all this new knowledge it becomes obvious that we are no longer blocked on waiting for system entropy and are now CPU bound, making pseudo random number generation a solvable issue through scaling.