Knuth Volume 2: Seminumerical Algorithms

Chapter 3: Random Numbers

Introduction

Random numbers are essential for many computer applications: simulation, sampling, numerical analysis, computer programming (as a source of arbitrary data for testing), decision-making, and cryptography. Knuth observes that random numbers should not be generated "by a random method." Instead, we require a deterministic procedure that produces numbers satisfying useful statistical tests of randomness — hence the term pseudo-random numbers.

The chapter treats four main topics:

Generating uniform random numbers (the fundamental building block)
Statistical tests for verifying randomness
Techniques for transforming uniform variates into other distributions
The philosophical question: what is a random sequence?

Generating Uniform Random Numbers

A uniform random number generator produces a sequence of numbers that behave as if each were drawn independently from the uniform distribution on the interval [0, 1). The standard approach is to generate integers $$ X_n $$ in the range $0 \leq X_n < m$ and then output the scaled value $$ U_n = X_n / m $$ .

The Linear Congruential Method

The linear congruential generator (LCG) is the most widely studied and applied method. It is defined by the recurrence:

$X_{n+1} = (a X_n + c) \bmod m, \qquad n \geq 0$

where

$\begin{align} m &> 0 \quad \text{— the modulus} \\ 0 &< a < m \quad \text{— the multiplier} \\ 0 &\leq c < m \quad \text{— the increment} \\ 0 &\leq X_0 < m \quad \text{— the seed (starting value)} \end{align}$

The sequence is entirely determined by the four parameters $$ (m, a, c, X_0) $$ . The scaled output is:

$U_n = \dfrac{X_n}{m} \in [0, 1)$

When $$ c = 0 $$ , the generator is called a multiplicative congruential generator (or Lehmer generator). When $c \neq 0$ , it is called a mixed congruential generator.

The sequence repeats after a finite number of steps — the period of the generator. A generator with period $$ m $$ (when $c \neq 0$ ) or $$ m-1 $$ (when $$ c = 0 $$ and $$ m $$ is prime) is said to have full period.

Choice of Modulus

The modulus $$ m $$ should be as large as possible, since the period cannot exceed $$ m $$ . Practical considerations:

Power of two: Let $$ m = 2^e $$ , where $$ e $$ is the word size of the computer (e.g., $2^{31}$ or $2^{64}$ ). The modulo operation reduces to bit truncation, which is extremely fast. Disadvantage: lower-order bits have short periods.

Prime modulus: Let $$ m $$ be a large prime. Gives better randomness properties in the low-order bits. Disadvantage: the modulo operation requires an explicit division step, though Schrage's method can avoid double-width products.

Mersenne primes: Using $$ m = 2^p - 1 $$ (Mersenne prime) combines some benefits of both approaches; e.g., $m = 2^{31} - 1$ .

Choice of Multiplier

The choice of multiplier $$ a $$ is critical to the quality of the generator.

Hull–Dobell Theorem (full period for $c \neq 0$ ): A linear congruential generator has period $$ m $$ if and only if:

$\gcd(c, m) = 1$
$a \equiv 1 \pmod{p}$ for every prime factor $$ p $$ of $$ m $$
$a \equiv 1 \pmod{4}$ if $4 \mid m$

Full period for $$ c = 0 $$ , $$ m $$ prime: The period is $$ m-1 $$ if and only if $$ a $$ is a primitive element modulo $$ m $$ (i.e., $a^k \not\equiv 1 \pmod{m}$ for $1 \leq k < m-1$ ).

Full period for $$ c = 0 $$ , $$ m = 2^e $$ : The maximum period is $m/4 = 2^{e-2}$ , achieved when $a \equiv \pm 3 \pmod{8}$ and the seed $$ X_0 $$ is odd.

Additional desiderata: Beyond achieving maximum period, the multiplier should:

Produce good results on the spectral test (see below)
Avoid values too close to $$ 0 $$ or $$ m $$
Satisfy the condition that $$ a-1 $$ is not excessively divisible by prime factors of $$ m $$

Knuth recommends multipliers that pass the spectral test in dimensions 2 through 6. Some well-known good multipliers:

$$ a = 16807 $$ for $m = 2^{31}-1$ (MINSTD / Park–Miller)
$$ a = 48271 $$ for $m = 2^{31}-1$ (improved MINSTD)
$$ a = 6364136223846793005 $$ for $m = 2^{64}$ (MMIX)

Potency

The potency of a linear congruential sequence is defined as the smallest integer $$ s $$ such that:

$(a-1)^s \equiv 0 \pmod{m}$

The concept captures how much mixing the multiplier provides. A higher potency indicates that successive values are less correlated. For a modulus $$ m = 2^e $$ , the potency is $s = \lceil e / \nu_2(a-1) \rceil$ , where $\nu_2(n)$ is the exponent of 2 dividing $$ n $$ .

For good randomness, the potency should be at least 5 or 6. When $a \equiv 5 \pmod{8}$ (so $$ a-1 $$ is divisible by 4 but not by 8), the potency is maximized at $$ s = e/2 $$ .

If the potency is too low, the generator exhibits strong serial correlation. For example, $$ a = 1 $$ gives potency $$ s=1 $$ and produces a simple counter (not random at all).

Other Methods

Knuth also surveys alternatives to the linear congruential method:

Additive number generators: Defined by $X_n = (X_{n-\ell} + X_{n-k}) \bmod m$ . These generalize the Fibonacci recurrence to a larger lag. Attractive for very long periods.

Lagged Fibonacci generators: Use operations other than addition, such as multiplication or XOR.

Linear feedback shift registers (LFSRs): Based on arithmetic in $\mathrm{GF}(2)[x]$ . All bits are full-period, avoiding the low-bit weakness of power-of-two LCGs. The Mersenne Twister is a modern descendant.

Combined generators: Summing or shuffling outputs from multiple independent generators can increase period and improve statistical properties. Example: the Wichmann–Hill generator combines three LCGs.

Statistical Tests

A generator that produces numbers deterministically cannot be "truly random." Instead, we subject it to statistical tests and declare it "pseudo-random" if it passes a battery of tests designed to detect common non-random behaviors.

General Test Procedures

Every statistical test follows the same pattern:

State a null hypothesis $$ H_0 $$ : "the sequence is random."
Select a test statistic $$ T $$ that measures some departure from randomness.
Compute the theoretical distribution of $$ T $$ under $$ H_0 $$ .
Observe the value $$ t $$ of $$ T $$ from the actual generator output.
Reject $$ H_0 $$ if $P(T \geq t \mid H_0)$ is too small (typically < 0.01 or > 0.99).

Knuth recommends applying many different tests, since a generator may pass one test while failing another. A generator that passes test A, test B, and test C is more trustworthy than one that passes only test A.

Empirical Tests

Empirical tests apply statistical procedures directly to the numbers produced by the generator. No knowledge of the generator's internal structure is used.

Equidistribution (frequency) test: Does each possible value appear equally often? Use the chi-squared test or Kolmogorov–Smirnov test.

$\chi^2 = \sum_{s=1}^{k} \dfrac{ (Y_s - \frac{n}{k})^2 }{ \frac{n}{k} }$

where $$ Y_s $$ is the observed count in category $$ s $$ and $$ n $$ is the total number of observations. Under $$ H_0 $$ , this follows a chi-squared distribution with $$ k-1 $$ degrees of freedom.

Serial test: Do pairs of consecutive numbers distribute uniformly in the unit square? Generalizes to triples, quadruples, etc.

Gap test: Examine the lengths of gaps between occurrences of numbers in a specified range. The gap lengths should follow a geometric distribution.

Poker test (partition test): Group $$ n $$ random numbers into hands of $$ k $$ cards each and count the frequency of each poker-hand pattern. Classical version uses $$ k = 5 $$ .

Coupon collector's test: Observe how many random numbers must be generated to obtain a complete set of values from 1 to $$ d $$ . The distribution of the waiting time has a known form.

Permutation test: Divide the sequence into $$ n $$ groups of $$ t $$ elements; check whether the $$ t! $$ possible relative orderings occur with equal frequency.

Run test: Count the number of monotone runs (ascending or descending subsequences). Their lengths should follow a known distribution.

Collision test: How many numbers must be generated before a duplicate appears? Related to the birthday paradox.

Maximum-of- $$ t $$ test: The distribution of the maximum of $$ t $$ consecutive uniform variates is $\Pr(\max \leq x) = x^t$ . Apply the KS test to observed maxima.

Theoretical Tests

Theoretical tests use knowledge of the generator's recurrence to predict its global behavior, without actually running it.

Serial correlation test: Compute the covariance between $$ X_n $$ and $X_{n+1}$ . For LCGs, the serial correlation coefficient $\rho_1$ can be bounded analytically:

$\rho_1 \approx \frac{1}{a} - \frac{6c}{am} \left(1 - \frac{c}{m}\right) + \mathcal{O}\left(\frac{a}{m}\right)$

Low serial correlation is desirable.

Lattice structure: The set of all $$ t $$ -tuples $(X_n, X_{n+1}, \dots, X_{n+t-1})$ from an LCG lies on a lattice of equidistant parallel hyperplanes (Marsaglia's theorem). This inherent structure is the LCG's principal theoretical weakness.

The Spectral Test

The spectral test is the most powerful theoretical test for LCGs. For a given dimension $$ t $$ , it measures the maximum distance $1/\nu_t$ between adjacent parallel hyperplanes that cover all $$ t $$ -tuples.

Formally, for an LCG with period $$ m $$ , define:

$\nu_t = \min \left\{ \sqrt{x_1^2 + x_2^2 + \dots + x_t^2} \;\middle|\; x_1 + a x_2 + \dots + a^{t-1} x_t \equiv 0 \pmod{m}, \; (x_1, \dots, x_t) \neq (0, \dots, 0) \right\}$

The figure of merit is $\nu_t$ (larger is better). A good generator should have $\nu_t \geq 0.1 m^{1/t}$ for $t = 2, 3, \dots, 6$ . Knuth provides extensive tables of spectral test results for various multipliers.

The spectral test reveals that RANDU ( $$ a = 65539 $$ , $m = 2^{31}$ ) is catastrophically bad: in 3 dimensions, all points fall on just 15 planes.

Other Types of Random Quantities

Once a source of uniform random numbers is established, we can transform them into other distributions.

Numerical Distributions

Knuth covers methods for generating random variates from non-uniform distributions:

Inverse transformation method: If $$ F(x) $$ is the CDF, and $$ U $$ is uniform on [0,1), then $X = F^{-1}(U)$ has distribution $$ F $$ . Example: for the exponential distribution, $X = -\lambda \ln U$ .

Rejection method (von Neumann): To generate from density $$ f(x) $$ on $$ [a,b] $$ with $0 \leq f(x) \leq M$ : generate $X \sim \text{Uniform}(a,b)$ and $Y \sim \text{Uniform}(0,M)$ ; accept $$ X $$ if $Y \leq f(X)$ , otherwise try again.

Normal distribution: Several methods exist. The Box–Muller method generates two independent standard normal variates from two uniform variates:

$\begin{align} Z_1 &= \sqrt{-2 \ln U_1} \cos(2 \pi U_2) \\ Z_2 &= \sqrt{-2 \ln U_1} \sin(2 \pi U_2) \end{align}$

The polar method (Marsaglia) avoids trigonometric evaluation and is generally faster.

Discrete distributions: An efficient algorithm for generating from a discrete distribution with probabilities $p_0, p_1, \dots, p_{n-1}$ uses the "alias method" (Walker), which requires $\mathcal{O}(n)$ preprocessing and $\mathcal{O}(1)$ time per sample.

Random Sampling and Shuffling

Random sampling without replacement: To select $$ k $$ items uniformly from a set of $$ n $$ :

Algorithm S (selection sampling): Process items one by one; each item is selected with probability equal to (remaining needed) / (remaining items). Runs in $\mathcal{O}(n)$ time.

Reservoir sampling: When $$ n $$ is unknown in advance (streaming data), maintain a reservoir of $$ k $$ candidates. Each new item replaces a random reservoir element with probability $$ k/t $$ , where $$ t $$ is the number of items seen so far.

Fisher–Yates shuffle (Algorithm P): To randomly permute an array $$ A[1..n] $$ :

$\begin{align} &\text{For } i = n \text{ down to } 2: \\ &\qquad j \leftarrow \text{random integer in } [1, i] \\ &\qquad \text{swap } A[i] \leftrightarrow A[j] \end{align}$

This produces each of the $$ n! $$ permutations with equal probability. It runs in $\mathcal{O}(n)$ time. Knuth attributes this method to Fisher and Yates (1938) and credits Durstenfeld (1964) with the computer implementation.

A common pitfall is using $j \leftarrow$ random in $$ [1, n] $$ (instead of $$ [1, i] $$ ), which yields a non-uniform distribution with $$ n^n $$ equally likely outcomes — not a uniform permutation.

What Is a Random Sequence?

Knuth explores this deep philosophical question (Section 3.5, marked with an asterisk as optional). Several definitions have been proposed:

Von Mises–Wald–Church definition: A sequence is random if it satisfies all "place selection" rules (computable subsequence selection). Church formulated this in terms of computable functions, making it more rigorous.

Martin-Löf definition (1966): A sequence is random if it does not belong to any effectively null set — a set of measure zero that can be described by a computable sequence of statistical tests. This is the foundation of algorithmic randomness.

Kolmogorov complexity: A finite sequence is random if its shortest description (program that generates it) is as long as the sequence itself. An infinite sequence is random if all its prefixes are incompressible.

In practice, for computer applications, Knuth settles on a pragmatic definition: a sequence is "sufficiently random" if it passes a battery of statistical tests relevant to the intended application, and if knowledge of $$ n $$ elements does not allow practical prediction of the $$ (n+1) $$ st element.

Summary

Knuth summarizes the main recommendations:

Use a linear congruential generator with a large modulus (at least $2^{30}$ ).
Choose the multiplier $$ a $$ to satisfy the spectral test in dimensions 2 through 6.
For $c \neq 0$ , the increment need only be relatively prime to $$ m $$ ; $$ c = 1 $$ is convenient.
Use the most significant bits of $$ X_n $$ , not the least significant ones, when reducing to a smaller range.
Subject the generator to multiple empirical tests before relying on it.
For serious Monte Carlo work, consider combined generators or more modern methods with even longer periods.

Example Problems

Spectral Test Calculation

Problem: Compute the two-dimensional spectral test value $\nu_2$ for the generator $X_{n+1} = (3 X_n + 0) \bmod 7$ .

Solution

The generator has $$ m = 7 $$ , $$ a = 3 $$ , $$ c = 0 $$ . In two dimensions, we seek:

$\nu_2 = \min \left\{ \sqrt{x_1^2 + x_2^2} \;\middle|\; x_1 + 3 x_2 \equiv 0 \pmod{7}, \; (x_1, x_2) \neq (0,0) \right\}$

The condition $x_1 \equiv -3 x_2 \pmod{7}$ . For $$ x_2 = 1 $$ : $x_1 \equiv 4$ , and $\sqrt{4^2 + 1^2} = \sqrt{17} \approx 4.12$ . For $$ x_2 = 2 $$ : $x_1 \equiv 1$ , giving $\sqrt{1^2 + 2^2} = \sqrt{5} \approx 2.24$ . Exhaustive checking confirms the minimum is $\sqrt{5}$ , so $\nu_2 = \sqrt{5}$ .

The figure of merit threshold at $$ t=2 $$ is $0.1 \cdot 7^{1/2} \approx 0.265$ . Since $\nu_2 = \sqrt{5} \approx 2.24 \gg 0.265$ , this generator passes the 2D spectral test (as expected for a full-period generator with prime modulus).

RANDU Failure

Problem: Show why RANDU ( $X_{n+1} = 65539 X_n \bmod 2^{31}$ ) fails the 3D spectral test.

Solution

For RANDU, $m = 2^{31}$ , $a = 65539 = 2^{16} + 3$ . In 3 dimensions, all triples $(X_n, X_{n+1}, X_{n+2})$ satisfy:

$X_{n+2} = 6 X_{n+1} - 9 X_n \pmod{2^{31}}$

This follows from the identity $a^2 = 6a - 9 \pmod{m}$ (since $(2^{16}+3)^2 = 2^{32} + 6 \cdot 2^{16} + 9 \equiv 6a - 9 \pmod{2^{31}}$ ).

Thus all points lie on the plane $x_1 - 6 x_2 + 9 x_3 \equiv 0$ , and by symmetry on just 15 widely spaced planes in 3-space — a catastrophic failure of randomness.

Flags

AOCP/Random Numbers

From charlesreid1

Contents

Knuth Volume 2: Seminumerical Algorithms

Chapter 3: Random Numbers

Introduction

Generating Uniform Random Numbers

The Linear Congruential Method

Choice of Modulus

Choice of Multiplier

Potency

Other Methods

Statistical Tests

General Test Procedures

Empirical Tests

Theoretical Tests

The Spectral Test

Other Types of Random Quantities

Numerical Distributions

Random Sampling and Shuffling

What Is a Random Sequence?

Summary

Example Problems

Spectral Test Calculation

Solution

RANDU Failure

Solution

Related

Flags