(Pseudo) Random Number Generators

June 21st, 2006

If you are dealing with lots of computing power, at some point you will decide to try and leverage randomness in solving problems. The most well-known analytical technique leveraging randomness is called the Monte-Carlo method, and is called that for obvious reasons. Another technique that leverages randomness is Hashing (which will be covered in another post).

One of the pre-requisites for any techniques that leverage randomness is a good random number generator. A bad random number generator can give misleading results, and diagnosing the resulting problems is like wandering around in a blizzard looking for snowflakes that look too alike. You therefore want to start with a good RNG, and ideally you want as much proof of its randomness as possible before you start using it.

On computers, its is more correct to call RNGs Pseudo-Random Number Generators (PRNG), because they aren’t, in fact, random at all. They are instead deterministic, in that starting from the same state, they always give the same resulting sequence of numbers. This can be very useful, for example, when testing algorithms. The basic form of an RNG is a function: (rand,state) = f(state).


Whilst there are sources of real random numbers, for example Lavarnd or HotBits, which generate randomness by observing physical phenomena such as radioactive decay, in general these sources of randomness are too low-bandwidth and/or complicated for high-performance use. They can be good for initializing RNGs (for example, you can grab 2048 bytes of pure randomness from hotbits with a HTTP request). The page on How HotBits Works is pretty interesting reading.

Unfortunately, there is no absolute measure of how random a set of numbers is. Instead, there are various statistical tests that are designed to highlight any variances from ‘true’ randomness. The most famous suite of tests is called the Diehard suite, and was created by George Marsaglia.

One of the simplest possible tests of randomness is to generate a histogram from a set of random numbers, and then compute what the probability of generating that histogram would be if the random numbers were in fact truly random. The basic histogram approach can be extended into multiple dimensions, tracking the probability of sequences of various lengths occurring. Another test is to generate a sorted list of random numbers, and make a histogram of the spacings between the numbers in the list. This is called the Birthday Spacings test and the resulting histogram should follow the Poisson distribution. Other tests are more involved, for example, performing random walks using the random number generator to decide whether you turn left or right, and then measuring various statistics on the resulting paths. In general, however, statistical tests can prove that a random number generator is bad, but not that it is good.

For any given pseudo-random number generator, it is always possible to create a test that it will fail. For example, it will always fail the test that measures the probability of the generated sequence not being the same as that generated by the generator itself.

The simplest PRNG is known as the Linear Congruential Generator. Many of you will already have seen one of these: X[t+1] = (A*X[t] + B) % M, with prime M (usually chosen to be a bit smaller than the largest integer representable in a single word). If you iterate an LCG enough, it will repeat the generated sequence of random numbers. The maximum possible period of an LCG is equal to M-1, or around (2^32 -1) on a 32-bit computer.

Another, less well known random number generator is a generalisation of the LCG, called the Multiple Recursive Generator. The form of this generator is X[t+1] = (a[0]*x[t] + a[1]*x[t-1] + … + a[k]*x[t-k]) % M, with prime M. The maximum possible period for this generator is (M^k – 1), or (2^96 -1) for k=3 on a 32-bit computer.

One of highest quality generators to date is called the Mersenne Twister, which has a period of (2^19937 – 1), a C# implementation of which can be found here.

A long period is important because, over the lifetime of an application leveraging randomness, you would like to ensure than no two runs of the application will use overlapping sequences (depending on the application, this may not be a problem if it happened, but if it was, it would be very difficult to diagnose). Some of you may have heard of the Birthday Paradox, in which there is a 50% chance that two people in any group of 23 will have the same birthday. The same principle applies to random number generators, which generate long sequences of numbers in a single run of an application that may be run a large amount of times. You basically want to ensure that there is a huge amount of possible starting points (birthdays) relative to the number of random starts in the sequence you make (people), thus ensuring that the probability of two sequences overlapping is extremely small.

An excellent introductory (though mathematical) treatment on this subject is Random Number Generation by Paul L’Ecuyer.

Comments are closed.