Overview of Experimental Design and Surrogate Models

The Problem Statement

Purpose: create a cheap representation of an expensive computer model

We're picking some input parameters, and some output variables

Normally there is a map from one to the other: the real function $$ f $$ ,

$\boldsymbol{y} = f(\boldsymbol{x})$

And we're creating a surrogate model $$ g $$ ,

$\boldsymbol{y} = g(\boldsymbol{x})$

This is sometimes called a "metamodel", because it's a model of a model

Classes of Surrogate Models

There are several classes or forms for $$ g $$

Latin hypercube
Space-filling
Uniform
Neural networks
Gaussian
Polynomials (response surface methodology)

I won't cover all, I will only cover latin hypercube, space-filling, and response surface methodologies

Surrogate Modeling

When constructing surrogate models, important to distinguish between computer surrogate modeling (metamodeling) and experimental surrogate modeling

Big difference: experiments have random errors

Basic Concepts for Experiments

Analysis of Variance tables

Basic Concepts for Metamodeling

Metamodeling: regression on data without random errors

Trying to predict true value $f(\boldsymbol{x})$ using surrogate model $g(\boldsymbol{x})$

Mean square error:

$MSE(g) = \int_R \left( f(\boldsymbol{x}) - g(\boldsymbol{x}) \right)^2 d\boldsymbol{x}$

where R is the region in parameter space where the metamodel applies

Example

Example function:

Real function f:

function real = real_function()                                      
                                                                     
% Define the domain of the real function                             
x=0:(pi/32):2*pi;                                                    
                                                                     
real = 2*x.*cos(4*pi*x);

Surrogate function f:

function surrogate = surrogate_function()

% Define the region in which the function is valid
x = 0:(pi/32):2*pi;

surrogate = 0.9931 + 1.96*(x-0.5) - 76.8838*(x-0.5).^2 - 152.0006*(x-0.5).^3 ...
        + 943.8565*(x-0.5).^4 + 1857.1427*(x-0.5).^5 - 3983.9332*(x-0.5).^6 ...
        - 7780.7937*(x-0.5).^7 + 5756.3561*(x-0.5).^8 + 11147.1698*(x-0.5).^9;

Comparing the two functions:

And comparing their error:

Mean square error:

r=real_function;
s=surrogate_function;
MSE = sum( (r-s).^2 );

MSE = 1.6354

Monte Carlo Sampling

Monte Carlo sampling is essentially a brute-force technique in which random samples are taken until confidence that the entire space has been sampled is satisfactory.

Latin Hypercube

Latin Hypercube is a way of sampling a space randomly, but in such a way that each dimension of the space is sampled.

For example, in the following figure, one sample falls into each bin of each of the x and y dimensions:

For a domain divided into $$ n $$ bins, each bin has an equal marginal probability of $$ 1/n $$

Algorithm

Purpose: create an experimental design with $$ n $$ runs (number of samples to be taken), and $$ s $$ input variables

The result should be a Latin hypercube design that is an $n \times s$ matrix denoting the variable combinations at which to sample

Step 1: take $$ s $$ independent permutations of $$ n $$ integers $\pi_{j}(1) \dots \pi_{j}(n)$

(note that $$ j $$ indexes the dimension of the Latin hypercube, $j=1 \dots s$ , and $$ n $$ is the number of runs or experiments)

Step 2: Take $$ ns $$ random numbers $U_{k}^{j}$ and compute the locations of the Latin hypercube samples as:

$x_{k}^{j} = \frac{ \pi_{j}(k) - U_{k}^{j} }{ n }$

where $k = 1 \dots n$ and $j = 1 \dots s$

Variation

One variation is centered Latin hypercube sampling

Each sample location is given by:

$x_{k}^{i} = \frac{ \pi^{j}(k) - 0.5 }{ n }$

where $k = 1 \dots n$ indexes which experiment (or run)

(this technique does not require random numbers)

LHS in Matlab

If you have the statistical toolbox (which CHPC @ University of Utah does), Matlab has an LHS function available to you: lhsdesign

Documentation is available here: http://www.mathworks.com/help/toolbox/stats/lhsdesign.html

Space-Filling

Response surface

Response Surface Methodology

Factorial Design

Fractional Factorial Design

Full Factorial Design

Composite Design

Other Alternatives

Box-Behnkin

Whatever Others

Experimental Design Lecture

From charlesreid1

Contents