Experimental Design Lecture
From charlesreid1
Overview of Experimental Design and Surrogate Models
The Problem Statement
Purpose: create a cheap representation of an expensive computer model
We're picking some input parameters, and some output variables
Normally there is a map from one to the other: the real function $ f $,
$ \boldsymbol{y} = f(\boldsymbol{x}) $
And we're creating a surrogate model $ g $,
$ \boldsymbol{y} = g(\boldsymbol{x}) $
This is sometimes called a "metamodel", because it's a model of a model
Classes of Surrogate Models
There are several classes or forms for $ g $
- Latin hypercube
- Space-filling
- Uniform
- Neural networks
- Gaussian
- Polynomials (response surface methodology)
I won't cover all, I will only cover latin hypercube, space-filling, and response surface methodologies
Surrogate Modeling
When constructing surrogate models, important to distinguish between computer surrogate modeling (metamodeling) and experimental surrogate modeling
Big difference: experiments have random errors
Basic Concepts for Experiments
Analysis of Variance tables
Basic Concepts for Metamodeling
Metamodeling: regression on data without random errors
Trying to predict true value $ f(\boldsymbol{x}) $ using surrogate model $ g(\boldsymbol{x}) $
Mean square error:
$ MSE(g) = \int_R \left( f(\boldsymbol{x}) - g(\boldsymbol{x}) \right)^2 d\boldsymbol{x} $
where R is the region in parameter space where the metamodel applies
Example
Example function:
Real function f:
function real = real_function()
% Define the domain of the real function
x=0:(pi/32):2*pi;
real = 2*x.*cos(4*pi*x);
Surrogate function f:
function surrogate = surrogate_function()
% Define the region in which the function is valid
x = 0:(pi/32):2*pi;
surrogate = 0.9931 + 1.96*(x-0.5) - 76.8838*(x-0.5).^2 - 152.0006*(x-0.5).^3 ...
+ 943.8565*(x-0.5).^4 + 1857.1427*(x-0.5).^5 - 3983.9332*(x-0.5).^6 ...
- 7780.7937*(x-0.5).^7 + 5756.3561*(x-0.5).^8 + 11147.1698*(x-0.5).^9;
Comparing the two functions:
And comparing their error:
Mean square error:
r=real_function;
s=surrogate_function;
MSE = sum( (r-s).^2 );
MSE = 1.6354
Monte Carlo Sampling
Monte Carlo sampling is essentially a brute-force technique in which random samples are taken until confidence that the entire space has been sampled is satisfactory.
Latin Hypercube
Latin Hypercube is a way of sampling a space randomly, but in such a way that each dimension of the space is sampled.
For example, in the following figure, one sample falls into each bin of each of the x and y dimensions:
For a domain divided into $ n $ bins, each bin has an equal marginal probability of $ 1/n $
Algorithm
Purpose: create an experimental design with $ n $ runs (number of samples to be taken), and $ s $ input variables
The result should be a Latin hypercube design that is an $ n \times s $ matrix denoting the variable combinations at which to sample
Step 1: take $ s $ independent permutations of $ n $ integers $ \pi_{j}(1) \dots \pi_{j}(n) $
(note that $ j $ indexes the dimension of the Latin hypercube, $ j=1 \dots s $, and $ n $ is the number of runs or experiments)
Step 2: Take $ ns $ random numbers $ U_{k}^{j} $ and compute the locations of the Latin hypercube samples as:
$ x_{k}^{j} = \frac{ \pi_{j}(k) - U_{k}^{j} }{ n } $
where $ k = 1 \dots n $ and $ j = 1 \dots s $
Variation
One variation is centered Latin hypercube sampling
Each sample location is given by:
$ x_{k}^{i} = \frac{ \pi^{j}(k) - 0.5 }{ n } $
where $ k = 1 \dots n $ indexes which experiment (or run)
(this technique does not require random numbers)
LHS in Matlab
If you have the statistical toolbox (which CHPC @ University of Utah does), Matlab has an LHS function available to you: lhsdesign
Documentation is available here: http://www.mathworks.com/help/toolbox/stats/lhsdesign.html
Space-Filling
Response surface
More in detail on this