AOCP/Random Numbers
From charlesreid1
Knuth Volume 2: Seminumerical Algorithms
Chapter 3: Random Numbers
Introduction
Random numbers are essential for many computer applications: simulation, sampling, numerical analysis, computer programming (as a source of arbitrary data for testing), decision-making, and cryptography. Knuth observes that random numbers should not be generated "by a random method." Instead, we require a deterministic procedure that produces numbers satisfying useful statistical tests of randomness — hence the term pseudo-random numbers.
The chapter treats four main topics:
- Generating uniform random numbers (the fundamental building block)
- Statistical tests for verifying randomness
- Techniques for transforming uniform variates into other distributions
- The philosophical question: what is a random sequence?
Generating Uniform Random Numbers
A uniform random number generator produces a sequence of numbers that behave as if each were drawn independently from the uniform distribution on the interval [0, 1). The standard approach is to generate integers $ X_n $ in the range $ 0 \leq X_n < m $ and then output the scaled value $ U_n = X_n / m $.
The Linear Congruential Method
The linear congruential generator (LCG) is the most widely studied and applied method. It is defined by the recurrence:
$ X_{n+1} = (a X_n + c) \bmod m, \qquad n \geq 0 $
where
$ \begin{align} m &> 0 \quad \text{— the modulus} \\ 0 &< a < m \quad \text{— the multiplier} \\ 0 &\leq c < m \quad \text{— the increment} \\ 0 &\leq X_0 < m \quad \text{— the seed (starting value)} \end{align} $
The sequence is entirely determined by the four parameters $ (m, a, c, X_0) $. The scaled output is:
$ U_n = \dfrac{X_n}{m} \in [0, 1) $
When $ c = 0 $, the generator is called a multiplicative congruential generator (or Lehmer generator). When $ c \neq 0 $, it is called a mixed congruential generator.
The sequence repeats after a finite number of steps — the period of the generator. A generator with period $ m $ (when $ c \neq 0 $) or $ m-1 $ (when $ c = 0 $ and $ m $ is prime) is said to have full period.
Choice of Modulus
The modulus $ m $ should be as large as possible, since the period cannot exceed $ m $. Practical considerations:
- Power of two: Let $ m = 2^e $, where $ e $ is the word size of the computer (e.g., $ 2^{31} $ or $ 2^{64} $). The modulo operation reduces to bit truncation, which is extremely fast. Disadvantage: lower-order bits have short periods.
- Prime modulus: Let $ m $ be a large prime. Gives better randomness properties in the low-order bits. Disadvantage: the modulo operation requires an explicit division step, though Schrage's method can avoid double-width products.
- Mersenne primes: Using $ m = 2^p - 1 $ (Mersenne prime) combines some benefits of both approaches; e.g., $ m = 2^{31} - 1 $.
Choice of Multiplier
The choice of multiplier $ a $ is critical to the quality of the generator.
Hull–Dobell Theorem (full period for $ c \neq 0 $): A linear congruential generator has period $ m $ if and only if:
- $ \gcd(c, m) = 1 $
- $ a \equiv 1 \pmod{p} $ for every prime factor $ p $ of $ m $
- $ a \equiv 1 \pmod{4} $ if $ 4 \mid m $
Full period for $ c = 0 $, $ m $ prime: The period is $ m-1 $ if and only if $ a $ is a primitive element modulo $ m $ (i.e., $ a^k \not\equiv 1 \pmod{m} $ for $ 1 \leq k < m-1 $).
Full period for $ c = 0 $, $ m = 2^e $: The maximum period is $ m/4 = 2^{e-2} $, achieved when $ a \equiv \pm 3 \pmod{8} $ and the seed $ X_0 $ is odd.
Additional desiderata: Beyond achieving maximum period, the multiplier should:
- Produce good results on the spectral test (see below)
- Avoid values too close to $ 0 $ or $ m $
- Satisfy the condition that $ a-1 $ is not excessively divisible by prime factors of $ m $
Knuth recommends multipliers that pass the spectral test in dimensions 2 through 6. Some well-known good multipliers:
- $ a = 16807 $ for $ m = 2^{31}-1 $ (MINSTD / Park–Miller)
- $ a = 48271 $ for $ m = 2^{31}-1 $ (improved MINSTD)
- $ a = 6364136223846793005 $ for $ m = 2^{64} $ (MMIX)
Potency
The potency of a linear congruential sequence is defined as the smallest integer $ s $ such that:
$ (a-1)^s \equiv 0 \pmod{m} $
The concept captures how much mixing the multiplier provides. A higher potency indicates that successive values are less correlated. For a modulus $ m = 2^e $, the potency is $ s = \lceil e / \nu_2(a-1) \rceil $, where $ \nu_2(n) $ is the exponent of 2 dividing $ n $.
For good randomness, the potency should be at least 5 or 6. When $ a \equiv 5 \pmod{8} $ (so $ a-1 $ is divisible by 4 but not by 8), the potency is maximized at $ s = e/2 $.
If the potency is too low, the generator exhibits strong serial correlation. For example, $ a = 1 $ gives potency $ s=1 $ and produces a simple counter (not random at all).
Other Methods
Knuth also surveys alternatives to the linear congruential method:
- Additive number generators: Defined by $ X_n = (X_{n-\ell} + X_{n-k}) \bmod m $. These generalize the Fibonacci recurrence to a larger lag. Attractive for very long periods.
- Lagged Fibonacci generators: Use operations other than addition, such as multiplication or XOR.
- Linear feedback shift registers (LFSRs): Based on arithmetic in $ \mathrm{GF}(2)[x] $. All bits are full-period, avoiding the low-bit weakness of power-of-two LCGs. The Mersenne Twister is a modern descendant.
- Combined generators: Summing or shuffling outputs from multiple independent generators can increase period and improve statistical properties. Example: the Wichmann–Hill generator combines three LCGs.
Statistical Tests
A generator that produces numbers deterministically cannot be "truly random." Instead, we subject it to statistical tests and declare it "pseudo-random" if it passes a battery of tests designed to detect common non-random behaviors.
General Test Procedures
Every statistical test follows the same pattern:
- State a null hypothesis $ H_0 $: "the sequence is random."
- Select a test statistic $ T $ that measures some departure from randomness.
- Compute the theoretical distribution of $ T $ under $ H_0 $.
- Observe the value $ t $ of $ T $ from the actual generator output.
- Reject $ H_0 $ if $ P(T \geq t \mid H_0) $ is too small (typically < 0.01 or > 0.99).
Knuth recommends applying many different tests, since a generator may pass one test while failing another. A generator that passes test A, test B, and test C is more trustworthy than one that passes only test A.
Empirical Tests
Empirical tests apply statistical procedures directly to the numbers produced by the generator. No knowledge of the generator's internal structure is used.
- Equidistribution (frequency) test: Does each possible value appear equally often? Use the chi-squared test or Kolmogorov–Smirnov test.
$ \chi^2 = \sum_{s=1}^{k} \dfrac{ (Y_s - \frac{n}{k})^2 }{ \frac{n}{k} } $
where $ Y_s $ is the observed count in category $ s $ and $ n $ is the total number of observations. Under $ H_0 $, this follows a chi-squared distribution with $ k-1 $ degrees of freedom.
- Serial test: Do pairs of consecutive numbers distribute uniformly in the unit square? Generalizes to triples, quadruples, etc.
- Gap test: Examine the lengths of gaps between occurrences of numbers in a specified range. The gap lengths should follow a geometric distribution.
- Poker test (partition test): Group $ n $ random numbers into hands of $ k $ cards each and count the frequency of each poker-hand pattern. Classical version uses $ k = 5 $.
- Coupon collector's test: Observe how many random numbers must be generated to obtain a complete set of values from 1 to $ d $. The distribution of the waiting time has a known form.
- Permutation test: Divide the sequence into $ n $ groups of $ t $ elements; check whether the $ t! $ possible relative orderings occur with equal frequency.
- Run test: Count the number of monotone runs (ascending or descending subsequences). Their lengths should follow a known distribution.
- Collision test: How many numbers must be generated before a duplicate appears? Related to the birthday paradox.
- Maximum-of-$ t $ test: The distribution of the maximum of $ t $ consecutive uniform variates is $ \Pr(\max \leq x) = x^t $. Apply the KS test to observed maxima.
Theoretical Tests
Theoretical tests use knowledge of the generator's recurrence to predict its global behavior, without actually running it.
- Serial correlation test: Compute the covariance between $ X_n $ and $ X_{n+1} $. For LCGs, the serial correlation coefficient $ \rho_1 $ can be bounded analytically:
$ \rho_1 \approx \frac{1}{a} - \frac{6c}{am} \left(1 - \frac{c}{m}\right) + \mathcal{O}\left(\frac{a}{m}\right) $
Low serial correlation is desirable.
- Lattice structure: The set of all $ t $-tuples $ (X_n, X_{n+1}, \dots, X_{n+t-1}) $ from an LCG lies on a lattice of equidistant parallel hyperplanes (Marsaglia's theorem). This inherent structure is the LCG's principal theoretical weakness.
The Spectral Test
The spectral test is the most powerful theoretical test for LCGs. For a given dimension $ t $, it measures the maximum distance $ 1/\nu_t $ between adjacent parallel hyperplanes that cover all $ t $-tuples.
Formally, for an LCG with period $ m $, define:
$ \nu_t = \min \left\{ \sqrt{x_1^2 + x_2^2 + \dots + x_t^2} \;\middle|\; x_1 + a x_2 + \dots + a^{t-1} x_t \equiv 0 \pmod{m}, \; (x_1, \dots, x_t) \neq (0, \dots, 0) \right\} $
The figure of merit is $ \nu_t $ (larger is better). A good generator should have $ \nu_t \geq 0.1 m^{1/t} $ for $ t = 2, 3, \dots, 6 $. Knuth provides extensive tables of spectral test results for various multipliers.
The spectral test reveals that RANDU ($ a = 65539 $, $ m = 2^{31} $) is catastrophically bad: in 3 dimensions, all points fall on just 15 planes.
Other Types of Random Quantities
Once a source of uniform random numbers is established, we can transform them into other distributions.
Numerical Distributions
Knuth covers methods for generating random variates from non-uniform distributions:
- Inverse transformation method: If $ F(x) $ is the CDF, and $ U $ is uniform on [0,1), then $ X = F^{-1}(U) $ has distribution $ F $. Example: for the exponential distribution, $ X = -\lambda \ln U $.
- Rejection method (von Neumann): To generate from density $ f(x) $ on $ [a,b] $ with $ 0 \leq f(x) \leq M $: generate $ X \sim \text{Uniform}(a,b) $ and $ Y \sim \text{Uniform}(0,M) $; accept $ X $ if $ Y \leq f(X) $, otherwise try again.
- Normal distribution: Several methods exist. The Box–Muller method generates two independent standard normal variates from two uniform variates:
$ \begin{align} Z_1 &= \sqrt{-2 \ln U_1} \cos(2 \pi U_2) \\ Z_2 &= \sqrt{-2 \ln U_1} \sin(2 \pi U_2) \end{align} $
The polar method (Marsaglia) avoids trigonometric evaluation and is generally faster.
- Discrete distributions: An efficient algorithm for generating from a discrete distribution with probabilities $ p_0, p_1, \dots, p_{n-1} $ uses the "alias method" (Walker), which requires $ \mathcal{O}(n) $ preprocessing and $ \mathcal{O}(1) $ time per sample.
Random Sampling and Shuffling
Random sampling without replacement: To select $ k $ items uniformly from a set of $ n $:
- Algorithm S (selection sampling): Process items one by one; each item is selected with probability equal to (remaining needed) / (remaining items). Runs in $ \mathcal{O}(n) $ time.
- Reservoir sampling: When $ n $ is unknown in advance (streaming data), maintain a reservoir of $ k $ candidates. Each new item replaces a random reservoir element with probability $ k/t $, where $ t $ is the number of items seen so far.
Fisher–Yates shuffle (Algorithm P): To randomly permute an array $ A[1..n] $:
$ \begin{align} &\text{For } i = n \text{ down to } 2: \\ &\qquad j \leftarrow \text{random integer in } [1, i] \\ &\qquad \text{swap } A[i] \leftrightarrow A[j] \end{align} $
This produces each of the $ n! $ permutations with equal probability. It runs in $ \mathcal{O}(n) $ time. Knuth attributes this method to Fisher and Yates (1938) and credits Durstenfeld (1964) with the computer implementation.
A common pitfall is using $ j \leftarrow $ random in $ [1, n] $ (instead of $ [1, i] $), which yields a non-uniform distribution with $ n^n $ equally likely outcomes — not a uniform permutation.
What Is a Random Sequence?
Knuth explores this deep philosophical question (Section 3.5, marked with an asterisk as optional). Several definitions have been proposed:
- Von Mises–Wald–Church definition: A sequence is random if it satisfies all "place selection" rules (computable subsequence selection). Church formulated this in terms of computable functions, making it more rigorous.
- Martin-Löf definition (1966): A sequence is random if it does not belong to any effectively null set — a set of measure zero that can be described by a computable sequence of statistical tests. This is the foundation of algorithmic randomness.
- Kolmogorov complexity: A finite sequence is random if its shortest description (program that generates it) is as long as the sequence itself. An infinite sequence is random if all its prefixes are incompressible.
In practice, for computer applications, Knuth settles on a pragmatic definition: a sequence is "sufficiently random" if it passes a battery of statistical tests relevant to the intended application, and if knowledge of $ n $ elements does not allow practical prediction of the $ (n+1) $st element.
Summary
Knuth summarizes the main recommendations:
- Use a linear congruential generator with a large modulus (at least $ 2^{30} $).
- Choose the multiplier $ a $ to satisfy the spectral test in dimensions 2 through 6.
- For $ c \neq 0 $, the increment need only be relatively prime to $ m $; $ c = 1 $ is convenient.
- Use the most significant bits of $ X_n $, not the least significant ones, when reducing to a smaller range.
- Subject the generator to multiple empirical tests before relying on it.
- For serious Monte Carlo work, consider combined generators or more modern methods with even longer periods.
Example Problems
Spectral Test Calculation
Problem: Compute the two-dimensional spectral test value $ \nu_2 $ for the generator $ X_{n+1} = (3 X_n + 0) \bmod 7 $.
Solution
The generator has $ m = 7 $, $ a = 3 $, $ c = 0 $. In two dimensions, we seek:
$ \nu_2 = \min \left\{ \sqrt{x_1^2 + x_2^2} \;\middle|\; x_1 + 3 x_2 \equiv 0 \pmod{7}, \; (x_1, x_2) \neq (0,0) \right\} $
The condition $ x_1 \equiv -3 x_2 \pmod{7} $. For $ x_2 = 1 $: $ x_1 \equiv 4 $, and $ \sqrt{4^2 + 1^2} = \sqrt{17} \approx 4.12 $. For $ x_2 = 2 $: $ x_1 \equiv 1 $, giving $ \sqrt{1^2 + 2^2} = \sqrt{5} \approx 2.24 $. Exhaustive checking confirms the minimum is $ \sqrt{5} $, so $ \nu_2 = \sqrt{5} $.
The figure of merit threshold at $ t=2 $ is $ 0.1 \cdot 7^{1/2} \approx 0.265 $. Since $ \nu_2 = \sqrt{5} \approx 2.24 \gg 0.265 $, this generator passes the 2D spectral test (as expected for a full-period generator with prime modulus).
RANDU Failure
Problem: Show why RANDU ($ X_{n+1} = 65539 X_n \bmod 2^{31} $) fails the 3D spectral test.
Solution
For RANDU, $ m = 2^{31} $, $ a = 65539 = 2^{16} + 3 $. In 3 dimensions, all triples $ (X_n, X_{n+1}, X_{n+2}) $ satisfy:
$ X_{n+2} = 6 X_{n+1} - 9 X_n \pmod{2^{31}} $
This follows from the identity $ a^2 = 6a - 9 \pmod{m} $ (since $ (2^{16}+3)^2 = 2^{32} + 6 \cdot 2^{16} + 9 \equiv 6a - 9 \pmod{2^{31}} $).
Thus all points lie on the plane $ x_1 - 6 x_2 + 9 x_3 \equiv 0 $, and by symmetry on just 15 widely spaced planes in 3-space — a catastrophic failure of randomness.
Related
AOCP/Positional Number Systems — the next section in Volume 2, covering the foundations of computer arithmetic.
AOCP/Floating Point Arithmetic — the section on floating-point representation and rounding errors.
AOCP/Generating Permutations and Tuples — covers Algorithm P (Fisher–Yates shuffle) and other combinatorial generation techniques from Volume 4.
Flags
| The Art of Computer Programming notes from reading Donald Knuth's Art of Computer Programming
Volume 1: Fundamental Algorithms Mathematical Foundations: AOCP/Infinite Series · AOCP/Binomial Coefficients · AOCP/Multinomial Coefficients AOCP/Harmonic Numbers · AOCP/Fibonacci Numbers Puzzles/Exercises:
Volume 2: Seminumerical Algorithms AOCP/Random Numbers · AOCP/Positional Number Systems AOCP/Floating Point Arithmetic · AOCP/Euclids Algorithm AOCP/Factoring into Primes · AOCP/Polynomial Arithmetic AOCP/Power Series Manipulation
Volume 3: Sorting and Searching AOCP/Internal Sorting · AOCP/Optimal Sorting · AOCP/External Sorting AOCP/Binary Tree Searching · AOCP/Hashing AOCP/Combinatorics · AOCP/Multisets · Rubiks Cube/Permutations
AOCP/Combinatorial Algorithms · AOCP/Boolean Functions AOCP/Five Letter Words · Rubiks Cube/Tuples AOCP/Generating Permutations and Tuples
|
| Algorithms Part of Computer Science Notes
Series on Algorithms
Algorithms/Sort · Algorithmic Analysis of Sort Functions · Divide and Conquer · Divide and Conquer/Master Theorem Three solid O(n log n) search algorithms: Merge Sort · Heap Sort · Quick Sort Algorithm Analysis/Merge Sort · Algorithm Analysis/Randomized Quick Sort
Algorithms/Search · Binary Search · Binary Search Modifications
Algorithms/Combinatorics · Algorithms/Combinatorics and Heuristics · Algorithms/Optimization · Divide and Conquer
Algorithms/Strings · Algorithm Analysis/Substring Pattern Matching
Algorithm complexity · Theta vs Big O Amortization · Amortization/Aggregate Method · Amortization/Accounting Method Algorithm Analysis/Matrix Multiplication
Estimation Estimation · Estimation/BitsAndBytes
Algorithm Practice and Writeups Project Euler · Five Letter Words · Letter Coverage
|