Revision as of 01:40, 3 July 2011

Overview

Composite experimental design refers to the successive sampling of parameter space in such a way as to construct a first or second order polynomial function.

Explanation

Setting Up the Whole Design

1. Select 5 (or 3) levels for each variable. Code each level with a numerical value, typically between $$ -1,1 $$ (but can be, e.g., between $$ -2,2 $$ , see Box and Draper 1987).

2. Create variable transforms to translate between the coded levels and the actual input parameter values (see below)

3. Create the full composite design matrix

4. Parse the full factorial matrix from above

5. Parse the fractional factorial matrix from above

6. Parse the one-factor-at-a-time matrix from above

7. Sample function in the following order:

One factor at a time
Fractional factorial
Full factorial
Full composite

How Many Levels?

The question of whether to choose 3 or 5 levels depends entirely on the case.

Typically, 3-level designs are chosen for experiments where multiple levels create difficulty in experimental setup. In this case, the minimum number of levels is desirable.

However, in simulations, 5-level designs are best, because there is no significant effort on the part of the user when running with a large number of levels.

Variable Transforms

For a variable $$ x_i $$ with range $\alpha_i \leq x_i \leq \beta_i$ ,

the transformed variable $\hat{x}_i$ has the range $-1 \leq \hat{x}_i \leq +1$ for factorial design

the transformed variable $\hat{x}_i$ has the range $-2 \leq \hat{x}_i \leq +2$ for composite design

Linear Variables

To transform a linear variable $$ x_i $$ to the variable $\hat{x}_i \in [-1, +1]$ :

$\hat{x}_i = \frac{ x_i - \left( \frac{\beta_i - \alpha_i}{2} + \alpha_i \right) }{ \frac{\beta_i - \alpha_i}{2} }$

To transform a linear variable $$ x_i $$ to the variable $\hat{x}_i \in [-2, +2]$ :

$\hat{x}_i = \frac{ x_i - \left( \frac{\beta_i - \alpha_i}{2} + \alpha_i \right) }{ \frac{\beta_i - \alpha_i}{4} }$

Log Variables

To transform a log variable $$ x_i $$ to the variable $\hat{x}_i \in [-1, +1]$ :

$\hat{x}_i = \frac{ \log{(x_i)} - \left( \frac{ \log{(\beta_i)} - \log{(\alpha_i)}}{2} + \log{(\alpha_i)} \right) }{ \frac{ \log{(\beta_i)} - \log{(\alpha_i)} }{2} }$

To transform a log variable $$ x_i $$ to the variable $\hat{x}_i \in [-2, +2]$ :

$\hat{x}_i = \frac{ \log{(x_i)} - \left( \frac{ \log{(\beta_i)} - \log{(\alpha_i)}}{2} + \log{(\alpha_i)} \right) }{ \frac{ \log{(\beta_i)} - \log{(\alpha_i)} }{4} }$

Full Composite Design Matrix

Full Factorial

Fractional Factorial

One Parameter At A Time

Example

Problem Information

For details about the problem, including the input uncertainty map, see Example Problem for Experimental Design

Code

Main article: Composite Experimental Design Matlab Code

Computing Response Surface

See Response Surface Methodology for general information on response surface methodology.

See Composite Experimental Design Matlab Code for the actual Matlab code used to generate the results below.

A Note on Visualization

Response surfaces are difficult to visualize if they are more than 2 dimensions. For example, imagine reducing the dimension of a 1-D function (e.g. $y = \log{(x)}$ ) by one dimension (a point).

Even worse is reducing by more than one dimension: for example, a plane described by a 2-D polynomial to a 0-D point.

For this reason, it is important to use more reliable metrics than visual inspection in order to judge how well a response surface represents the actual response.

A Note on Coefficient and Variable Order

The coefficient vector for each response surface is given below. The order of variables for the polynomials are:

$\dot{m}$ = mass flowrate
$$ k(T) $$ = reaction rate
$L_{mix}$ = mixing length for mixing model
$$ z_1 $$ = measurement location 1
$$ z_2 $$ = measurement location 2
$$ z_3 $$ = measurement location 3

Polynomial terms for an n-dimensional polynomial are ordered as:

$$ x_1 $$	First order non-interaction terms
$$ x_2 $$
$\dots$
$$ x_n $$
$$ x_1 x_2 $$	Second order interaction terms
$$ x_1 x_3 $$
$\dots$
$$ x_1 x_n $$
$$ x_2 x_3 $$
$\dots$
$x_{n-1} x_n$
$$ x_1^2 $$	Second order non-interaction terms
$$ x_2^2 $$
$\dots$
$$ x_n^2 $$
$$ x_1 x_2 x_2 $$	Third order interaction terms
$$ x_1 x_2 x_3 $$
$\dots$
$x_{n-1} x_{n} x_{n}$
$$ x_1^3 $$	Third order non-interaction terms
$$ x_2^3 $$
$\dots$
$$ x_n^3 $$

(Unless, of course, the coefficients are specified to be otherwise).

Quadratic Surface, 6 Dimensions

A quadratic response surface for $y_{p,exit}$ , a quadratic function of 6 input parameters of the form:

$\hat{y}(\boldsymbol{x}) = b_0 + \sum_{i=1}^{6} b_i x_i + \sum_{i < j} \sum_{j=1}^{6} b_{ij} x_i x_j + \sum_{i=1}^{6} b_i x_i^2$

was computed using Matlab's regstats command [1].

Because the response surface is six dimensions, graphical representation is difficult (see preceding section). However, the surface was visualized using the mean values of each of the 4 non-visualized dimensions. The two dimensions visualized were $L_{mix}$ and $$ k(T) $$ .

The resulting polynomial coefficient vector $\mathbf{b}$ is:

b(01) = 4.0870e+03 
b(02) = -2.0956e+03 
b(03) = -1.2574e+03 
b(04) = -4.1912e+02 
b(05) = -2.6527e-01 
b(06) = 8.2956e-02 
b(07) = -8.3864e+02 
b(08) = 4.1912e+02 
b(09) = 4.0102e-09 
b(10) = 4.1912e+02 
b(11) = 1.2271e-08 
b(12) = 1.0050e-08 
b(13) = 4.1912e+02 
b(14) = 1.2039e-10 
b(15) = 1.1920e-10 
b(16) = 1.1952e-10 
b(17) = 7.9500e-02 
b(18) = 1.2627e-11 
b(19) = 1.2676e-11 
b(20) = 1.2491e-11 
b(21) = 6.4480e-03 
b(22) = -9.1954e-04 
b(23) = 9.1895e-09 
b(24) = 7.8094e-09 
b(25) = 8.7553e-09 
b(26) = 1.4867e-02 
b(27) = 1.1544e-02 
b(28) = 4.1922e+02

The corresponding polynomial term for each coefficient (i.e. the order of polynomial terms) match the order described in Matlab's x2fx function documentation [2]. That is:

1. Constant term

2. Linear terms $x_1, x_2, \dots x_n$

3. Interaction terms $x_{1,2}, x_{1,3}, \dots x_{1,n}, x_{2,3}, \dots x_{n-1,n}$

4. Squared terms, in order $x_1^2, x_2^2, \dots x_n^2$

The resulting response surface, holding all other parameters constant at their mean value, looks like:

File:CompositeResponseSurface Dim6 Deg2.png

Some key statistics for the response surface are given here:

---------------------------------------------------
Response surface summary of information:
Number of variables in response surface is 6. 
Number of terms in polynomial is 28. 
Degree of response surface is 2.
MSE =			 0.03845480 
MSE DoF = 			 17 

L-inf norm resid = 	 0.34272386 

R^2 =			 0.86371957 
adjusted R^2 =		 0.64727417 
---------------------------------------------------

Quadratic Surface, 2 Dimensions

The response surface resulting from the regression of only the two dimensions visualized (of the same form, but lower in dimension) results in a polynomial coefficient vector of:

b(01) = 0.2019 
b(02) = -0.1065 
b(03) = 0.1115 
b(04) = 0.0269 
b(05) = -0.0145 
b(06) = -0.0009

It also results in the following response surface:

File:CompositeResponseSurface Dim2 Deg2.png

This surface has the following statistics:

---------------------------------------------------
Response surface summary of information:
Number of variables in response surface is 2. 
Number of terms in polynomial is 6. 
Degree of response surface is 2.
MSE =			 0.00690353 
MSE DoF = 			 39 

L-inf norm resid = 	 0.13735696 

R^2 =			 0.93490530 
adjusted R^2 =		 0.92655983 
---------------------------------------------------

It is obvious that removing the 4 non-visualized dimensions yields very significant differences in the response surface statistics.

Also of note, the 2-dimensional surface predicts a response greater than 1, physically impossible for the response of interest (mass fractions). However, this is a constraint that is not incorporated into the regression procedure.

As polynomial degrees increase, this characteristic of the response surfaces (predicting impossible or non-physical responses) becomes more exaggerated.

Cubic Surface, 6 Dimensions: Trouble in Paradise

A 6-dimensional cubic response surface has 84 coefficients - much higher than the number of sample points obtained with a composite design. However, if most 3rd order interaction terms are eliminated, and only a few are used, this will significantly reduce the number of coefficients.

A cubic model was used that was the same as the quadratic models described above, but with the addition of 9 third order terms, listed on the right.

The coefficient matrix for each term is:

b( 1) = 351.7623 
b( 2) = -0.0000 
b( 3) = -0.0000 
b( 4) = -0.0000 
b( 5) = -6.4897 
b( 6) = -0.1842 
b( 7) = -1048.2746 
b( 8) = 0.0000 
b( 9) = -0.0000 
b(10) = 0.0000 
b(11) = -0.0000 
b(12) = -0.0000 
b(13) = 0.0000 
b(14) = -0.0000 
b(15) = -0.0000 
b(16) = -0.0000 
b(17) = 6.3774 
b(18) = -0.0000 
b(19) = -0.0000 
b(20) = -0.0000 
b(21) = -1.5293 
b(22) = 0.3625 
b(23) = -0.0000 
b(24) = -0.0000 
b(25) = -0.0000 
b(26) = 0.0140 
b(27) = 0.0094 
b(28) = 1048.0479 
b(29) = -349.3145 
b(30) = 0.0017 
b(31) = -2.1220 
b(32) = -0.0000 
b(33) = -0.0000 
b(34) = -0.0000 
b(35) = -0.0034 
b(36) = 2.6298 
b(37) = -0.5728

where the order for the first 28 terms is the same as for the quadratic models above, and the remaining 9 terms are in the order given in the table to the right.

Term
$$ x_1^3 $$
$$ x_2^3 $$
$$ x_3^3 $$
$$ x_4^3 $$
$$ x_5^3 $$
$$ x_6^3 $$
$$ x_1 x_2 x_3 $$
$$ x_2 x_3^2 $$
$$ x_2^2 x_3 $$

This model presents an interesting problem. The 6-dimensional response surface that results, plotted in 2 dimensions (again using mean values for non-visualized dimensions), looks like this:

File:CompositeResponseSurface Dim6 Deg3.png

Note the range of the response: this is clearly a fishy response surface (maximum predicted $$ y_p $$ is on the order of 1000???). However, looking at the statistics shows that the polynomial creates a perfect fit!!!

---------------------------------------------------
Response surface summary of information:
Number of variables in response surface is 6. 
Number of terms in polynomial is 37. 
Degree of response surface is varied, deg is a matrix. Max degree = 3.
MSE =			 0.00000000 
MSE DoF = 			 8 

L-inf norm resid = 	 0.00000000 

R^2 =			 1.00000000 
adjusted R^2 =		 1.00000000 
---------------------------------------------------

At every experimental design sample point, the polynomial response prediction $\hat{y}$ exactly matches the actual response $$ y $$ , resulting in 0 error.

The knee-jerk reaction is that something must be wrong - the response surface is wrong, there was some mistake, the software should have come up with a more "reasonable" response surface to fit the sample points. However, regression (and the whole idea of using response surfaces to represent complex functions) is double-edged sword: you can make the function evaluation much, much cheaper - but the price you pay is a significant loss of information.

One may suggest an alternative validation technique of creating a low-dimensional response surface, which is easier to fit with a "reasonable" or "sensible" polynomial, and perform validation; then use the feasible (validated) values for each of those dimensions to create a second low-dimensional response surface, which is then validated; this yields a new feasible set, which can be combined with the old feasible set; and so on, until all dimensions have been covered and valid ranges for all input parameter values determined.

However, this approach is not equivalent, nor is it an improvement. When creating the low-dimensional response surface, one must select values for the other, ignored dimensions; these values are uncertain and are merely guesses. Changing the values of non-regressed variables will likely result in significant changes in the regression results (i.e. the response surface).

Box-Behnken Designs

The relationship between composite and Box Behnken designs is that, if you use a face-centered (i.e. a 3-level) composite design and combine it with a Box Behnken design, you will get a full $3^{k}$ factorial design. So composite and Box Behnken designs are both fractional $3^{k}$ factorial designs.

@@ Line 102: / Line 102: @@
 For this reason, it is important to use more reliable metrics than visual inspection in order to judge how well a response surface represents the actual response.
+===A Note on Coefficient and Variable Order===
+The coefficient vector for each response surface is given below. The order of variables for the polynomials are:
+* <math>\dot{m}</math> = mass flowrate
+* <math>k(T)</math> = reaction rate
+* <math>L_{mix}</math> = mixing length for mixing model
+* <math>z_1</math> = measurement location 1
+* <math>z_2</math> = measurement location 2
+* <math>z_3</math> = measurement location 3
+Polynomial terms for an n-dimensional polynomial are ordered as:
+{|class="wikitable"
+|-
+|<math>
+x_1
+</math>
+|First order non-interaction terms
+|-
+|<math>
+x_2
+</math>
+|
+|-
+|<math>
+\dots
+</math>
+|
+|-
+|<math>
+x_n
+</math>
+|
+|-
+|<math>
+x_1 x_2
+</math>
+|Second order interaction terms
+|-
+|<math>
+x_1 x_3
+</math>
+|
+|-
+|<math>
+\dots
+</math>
+|
+|-
+|<math>
+x_1 x_n
+</math>
+|
+|-
+|<math>
+x_2 x_3
+</math>
+|
+|-
+|<math>
+\dots
+</math>
+|
+|-
+|<math>
+x_{n-1} x_n
+</math>
+|
+|-
+|<math>
+x_1^2
+</math>
+|Second order non-interaction terms
+|-
+|<math>
+x_2^2
+</math>
+|
+|-
+|<math>
+\dots
+</math>
+|
+|-
+|<math>
+x_n^2
+</math>
+|
+|-
+|<math>
+x_1 x_2 x_2
+</math>
+|Third order interaction terms
+|-
+|<math>
+x_1 x_2 x_3
+</math>
+|
+|-
+|<math>
+\dots
+</math>
+|
+|-
+|<math>
+x_{n-1} x_{n} x_{n}
+</math>
+|
+|-
+|<math>
+x_1^3
+</math>
+|Third order non-interaction terms
+|-
+|<math>
+x_2^3
+</math>
+|
+|-
+|<math>
+\dots
+</math>
+|
+|-
+|<math>
+x_n^3
+</math>
+|
+|}
+(Unless, of course, the coefficients are specified to be otherwise).
 ===Quadratic Surface, 6 Dimensions===

Composite Experimental Design: Difference between revisions

From charlesreid1