Empirical Model-Building and Response Surfaces: Difference between revisions
From charlesreid1
| Line 55: | Line 55: | ||
=Chapter 3: Least Squares for Response Surface Work= | =Chapter 3: Least Squares for Response Surface Work= | ||
==Method of Least Squares== | ==Method of Least Squares== | ||
Least squares helps you to understand a model of the form: | Least squares helps you to understand a model of the form: | ||
y = f(x,t) + e | y = f(x,t) + e | ||
where: | where: | ||
E(y) = eta = f(x,t) | E(y) = eta = f(x,t) | ||
is the mean level of the response y which is affected by k variables (x1, x2, ..., xk) = x | is the mean level of the response y which is affected by k variables (x1, x2, ..., xk) = x | ||
It also involves p parameters (t1, t2, ..., tp) = t | It also involves p parameters (t1, t2, ..., tp) = t | ||
e is experimental error | e is experimental error | ||
To examine this model, experiments would run at n different sets of conditions, x1, x2, ..., xn | To examine this model, experiments would run at n different sets of conditions, x1, x2, ..., xn | ||
would then observe corresponding values of response y1, y2, ..., yn | would then observe corresponding values of response y1, y2, ..., yn | ||
Two important questions: | Two important questions: | ||
1. does postulated model accurately represent the data? | 1. does postulated model accurately represent the data? | ||
2. if model does accurately represent data, what are best estimates of parameters t? | 2. if model does accurately represent data, what are best estimates of parameters t? | ||
start with second question first | start with second question first | ||
Given: function f(x,t) for each experimental run | Given: function f(x,t) for each experimental run | ||
n discrepancies: | n discrepancies: | ||
<math>{y1 - f(x1,t)}, {y2 - f(x2,t)}, ..., {yn - f(xn,t)}</math> | <math>{y1 - f(x1,t)}, {y2 - f(x2,t)}, ..., {yn - f(xn,t)}</math> | ||
Method of least squares selects best value of t that make the sum of squares smallest: | Method of least squares selects best value of t that make the sum of squares smallest: | ||
<math> | <math> | ||
S(t) = \sum_{u=1}^{n} \left[ y_n - f \left( x_u, t \right) \right]^2 | S(t) = \sum_{u=1}^{n} \left[ y_n - f \left( x_u, t \right) \right]^2 | ||
</math> | </math> | ||
S(t) = sum of squares function | S(t) = sum of squares function | ||
minimizing choice of t is denoted | minimizing choice of t is denoted | ||
<math> | <math> | ||
\hat{t} | \hat{t} | ||
</math> | </math> | ||
are least-squares estimates of t good? | are least-squares estimates of t good? | ||
their goodness depends on the nature of the distribution of their errors | their goodness depends on the nature of the distribution of their errors | ||
least-squares estimates are appropriate if you can assume that experimental errors: | least-squares estimates are appropriate if you can assume that experimental errors: | ||
<math> | <math> | ||
\epsilon_u = y_u - \eta_u | \epsilon_u = y_u - \eta_u | ||
</math> | </math> | ||
are statistically independent and with constant variance, and are normally distributed | are statistically independent and with constant variance, and are normally distributed | ||
these are "standard assumptions" | these are "standard assumptions" | ||
==Linear models== | ==Linear models== | ||
this is a limiting case, where | this is a limiting case, where | ||
<math> | <math> | ||
\eta = f(x,t) = t_1 z_1 + t_2 z_2 + ... + t_p z_p | \eta = f(x,t) = t_1 z_1 + t_2 z_2 + ... + t_p z_p | ||
</math> | </math> | ||
adding experimental error <math>\epsilon = y - \eta</math>: | adding experimental error <math>\epsilon = y - \eta</math>: | ||
<math> | <math> | ||
y = t_1 z_1 + t_2 z_2 + ... + t_p z_p + \epsilon | y = t_1 z_1 + t_2 z_2 + ... + t_p z_p + \epsilon | ||
</math> | </math> | ||
model of this form is linear in the parameters | model of this form is linear in the parameters | ||
===Algorithm=== | ===Algorithm=== | ||
Formulate a problem with n observed responses, p parameters... | Formulate a problem with n observed responses, p parameters... | ||
this yields n equations of the form | this yields n equations of the form | ||
y_1 = t_1 z_{11} + t_2 z_{21} + ... | y_1 = t_1 z_{11} + t_2 z_{21} + ... | ||
y_2 = t_1 z_{21} + t_2 z_{22} + ... | y_2 = t_1 z_{21} + t_2 z_{22} + ... | ||
etc... | etc... | ||
This can be written in matrix form: | This can be written in matrix form: | ||
<math> | <math> | ||
\mathbf{y} = \mathbf{Z t} + \boldsymbol{\epsilon} | \mathbf{y} = \mathbf{Z t} + \boldsymbol{\epsilon} | ||
</math> | </math> | ||
and the dimensions of each matrix are: | and the dimensions of each matrix are: | ||
* y = n x 1 | * y = n x 1 | ||
| Line 205: | Line 160: | ||
* t = p x 1 | * t = p x 1 | ||
* epsilon = n x 1 | * epsilon = n x 1 | ||
the sum of squares function is given by: | the sum of squares function is given by: | ||
<math> | <math> | ||
S(\mathbf{t}) = \sum_{u=1}^{n} \left( y_u - t_1 z_{1u} - t_2 z_{2u} - ... - t_p z_{pu} \right)^2 | S(\mathbf{t}) = \sum_{u=1}^{n} \left( y_u - t_1 z_{1u} - t_2 z_{2u} - ... - t_p z_{pu} \right)^2 | ||
</math> | </math> | ||
or, | or, | ||
<math> | <math> | ||
S(t) = ( y - Zt )^{\prime} ( y - Zt ) | S(t) = ( y - Zt )^{\prime} ( y - Zt ) | ||
</math> | </math> | ||
this can be rewritten as: | this can be rewritten as: | ||
<math> | <math> | ||
| Line 231: | Line 180: | ||
===Rank of Z=== | ===Rank of Z=== | ||
If there are relationships between the different input parameters (z's), then the matrix Z can become singular | If there are relationships between the different input parameters (z's), then the matrix Z can become singular | ||
e.g. if there is a relationship z2 = c z1, then you can only estimate the linear combination z1 + c z2 | e.g. if there is a relationship z2 = c z1, then you can only estimate the linear combination z1 + c z2 | ||
reason: when z2 = c z1, changes in z1 can't be distinguished from changes in z2 | reason: when z2 = c z1, changes in z1 can't be distinguished from changes in z2 | ||
Z (an n x p matrix) is said to be full rank p if there are no linear relationships of the form: | Z (an n x p matrix) is said to be full rank p if there are no linear relationships of the form: | ||
a_1 z_1 + a_2 z_2 + ... + a_p z_p l= 0 | a_1 z_1 + a_2 z_2 + ... + a_p z_p l= 0 | ||
| Line 249: | Line 193: | ||
if there are q > 0 independent linear relationships, then Z has rank p - q | if there are q > 0 independent linear relationships, then Z has rank p - q | ||
| Line 444: | Line 385: | ||
</math> | </math> | ||
== | ==Orthogonalizing second regressor== | ||
In the above example, <math>z_1</math> and <math>z_2</math> are not orthogonal | |||
One can find the vectors <math>z_1</math> and <math>z_{2 \cdot 1}</math> that are orthogonal | |||
To do this, use least squares property that residual vector is orthogonal to space in which the predictor variables lie | |||
Regard <math>z_2</math> as "response" vector and <math>z_1</math> as predictor variable | |||
You then obtain <math>\hat{z_2} = 0.2 z_1</math> (how?) | |||
so the residual vector is <math>z_{2 \cdot 1} = z_2 - \hat{z_2} = z_2 - 0.2 z_1</math> | |||
now the model can be rewritten as <math>\eta = \left( t_1 + 0.2 t_2 \right) z_1 + t_2 \left( z_2 - 0.2 z_1 \right) = t z_1 + t_2 z_{2 \cdot 1}</math> | |||
This gives three least-squares equations: | |||
1. <math>\hat{y} = 2 z_1</math> | |||
2. <math>\hat{y} = 1.5 z_1 + 2.5 z_2</math> | |||
3. <math>\hat{y} = 2.0 z_1 + 2.5 z_{2 \cdot 1}</math> | |||
The analysis of variance becomes: | |||
{|border="1" class="wikitable;" | |||
!Source | |||
!df | |||
!SS | |||
|- | |||
|Response function with <math>z_1</math> only | |||
|1 | |||
|<math>\vert \hat{y} - \eta_0 |vert^2 = \left( \hat{t} - t_0 \right)^2 \sum z_1^2 = 12.0</math> | |||
|- | |||
|Extra due to <math>z_2</math> (given <math>z_1</math>) | |||
|1 | |||
|<math>\vert \hat{\hat{y}} - \hat{y} \vert^2 = \hat{t}_2^2 \sum z_{2 \cdot 1}^2 = 4.5</math> | |||
|- | |||
|Residual | |||
|1 | |||
|<math>\vert y - \hat{\hat{y}} \vert^2 = \sum \left( y - \hat{\hat{y}} \right)^2 = 1.5</math> | |||
|- | |||
|Total | |||
|3 | |||
|<math>\vert y - \eta_0 \vert^2 = \sum \left( y - \eta_0 \right)^2 = 18.0</math> | |||
|} | |||
==Generalization to p regressors== | ==Generalization to p regressors== | ||
With n observations and p parameters: | |||
n relations implicit in response function can be written | |||
<math>\boldsymbol{\eta} = \mathbf{Z t}</math> | |||
Assuming Z is full rank, and letting <math>\hat{\mathbf{t}}</math> be the vector of estimates given by normal equations | |||
<math>\left( \mathbf{ y - \hat{y} } \right)^{\prime} \mathbf{Z} = \left( y - Z \hat{t} \right)^{\prime} Z = 0</math> | |||
Sum of squares function is <math>S(t) = (y - \eta)^{\prime} (y - \eta) = (y - \hat{y})^{\prime} (y - \hat{y}) + ( \hat{y} - \eta )^{\prime} (\hat{y} - \eta)</math> | |||
because cross-product is zero from the normal equations | |||
<math>S(t) = S(\hat{t}) + (\hat{t} - t)^{\prime} \mathbf{Z^{\prime} Z} ( \hat{t} - t )</math> | |||
Furthermore, because <math>\mathbf{Z^{\prime} Z}</math> is positive definite, <math>S(t)</math> minimized when <math>t = \hat{t}</math> | |||
So the solution to the normal equations producing the least squares estimate is the one where <math>t = \hat{t}</math>: | |||
<math>\hat{t} = ( \mathbf{Z^{\prime} Z} )^{-1} \mathbf{Z^{\prime} y}</math> | |||
{|border="1" class="wikitable;" | |||
!Source | |||
!df | |||
!SS | |||
|- | |||
|Response function | |||
|p | |||
|<math>\vert \hat{y} - \eta \vert^2 = (\hat{t} - t)^{\prime} \mathbf{Z^{\prime} Z} ( \hat{t} - t )</math> | |||
|- | |||
|Residual | |||
|n-p | |||
|<math>\vert y - \hat{y} \vert^2 = \sum ( y - \hat{y} )^2 </math> | |||
|- | |||
|Total | |||
|n | |||
|<math>\vert y - \eta \vert^2 = \sum ( y - \eta )^2 </math> | |||
|} | |||
==Bias in Least-Squares Estimators if Inadequate Model== | ==Bias in Least-Squares Estimators if Inadequate Model== | ||
Say data was being fit with a model <math>y = Z_1 t_1 + \epsilon</math>, | |||
but the true model that should have been used is <math>y = Z_1 t_1 + Z_2 t_2 + \epsilon</math> | |||
<math>t_1</math> would be estimated by <math>\hat{t_1} = (\mathbf{ Z_1^{\prime} Z_1 } )^{-1} \mathbf{ Z_1^{\prime} y }</math> | |||
but using true model, | |||
<math>\begin{array}{rcl} | |||
E( \hat{t_1} ) &=& ( \mathbf{Z_1^{\prime} Z_1} )^{-1} \mathbf{Z_1^{\prime}} E(\mathbf{y}) \\ | |||
&=& ( \mathbf{ Z_1^{\prime} Z_1 } )^{-1} \mathbf{Z_1^{\prime}} (\mathbf{Z_1 t_1} + \mathbf{Z_2 t_2} ) \\ | |||
&=& \mathbf{t_1 + A t_2} | |||
\end{array} | |||
</math> | |||
==Pure Error, Lack of Fit== | ==Pure Error, Lack of Fit== | ||
==Confidence Intervals== | ==Confidence Intervals== | ||
==Robust Estimation== | ==Robust Estimation== | ||
Revision as of 23:48, 2 May 2011
Chapter 1: Introduction to Response Surface Methodology
Questions when planning initial set of experiments:
1. Which input variables should be studied?
2. Should the input variables be examined in their original form, or should transformed input variables be employed?
3. How should response be measured?
4. At which levels of a given input variable should experiments be run?
5. How complex a model is necessary in a particular situation?
6. How shall we choose qualitative variables?
7. What experimental arrangement (experimental design) should be used?
Chapter 2: Use of Graduating Functions
Polynomial approximations:
- a polynomial of degree d can be thought of as a Taylor series expansion of the true underlying theoretical function y(x) truncated after terms of dth order
- the higher the degree d, the more closely the Taylor series can approximate the true function
- the smaller the region R over which y(x) is being approximated with the polynomial approximation, the better the approximation
Issues with application of polynomial approximations:
- least squares - how does it work? what are its assumptions?
- standard errors of coefficients - how to estimate the standard deviations of the linear coefficients?
- adequacy of fit - approximating an unknown theoretical function empirically; need to be able to check whether a given degree of approximation is adequate; how can analysis of variance (ANOVA) and examination of residuals (observed - fitted values) help to check adequacy of fit?
- designs - what designs are suitable for fitting polynomials of first and second degrees? (Ch. 4, 5, 15, 13)
- transformations - how can one find transformations (generally)?
Chapter 3: Least Squares for Response Surface Work
Method of Least Squares
Least squares helps you to understand a model of the form:
y = f(x,t) + e
where:
E(y) = eta = f(x,t)
is the mean level of the response y which is affected by k variables (x1, x2, ..., xk) = x
It also involves p parameters (t1, t2, ..., tp) = t
e is experimental error
To examine this model, experiments would run at n different sets of conditions, x1, x2, ..., xn
would then observe corresponding values of response y1, y2, ..., yn
Two important questions:
1. does postulated model accurately represent the data?
2. if model does accurately represent data, what are best estimates of parameters t?
start with second question first
Given: function f(x,t) for each experimental run
n discrepancies:
$ {y1 - f(x1,t)}, {y2 - f(x2,t)}, ..., {yn - f(xn,t)} $
Method of least squares selects best value of t that make the sum of squares smallest:
$ S(t) = \sum_{u=1}^{n} \left[ y_n - f \left( x_u, t \right) \right]^2 $
S(t) = sum of squares function
minimizing choice of t is denoted
$ \hat{t} $
are least-squares estimates of t good?
their goodness depends on the nature of the distribution of their errors
least-squares estimates are appropriate if you can assume that experimental errors:
$ \epsilon_u = y_u - \eta_u $
are statistically independent and with constant variance, and are normally distributed
these are "standard assumptions"
Linear models
this is a limiting case, where
$ \eta = f(x,t) = t_1 z_1 + t_2 z_2 + ... + t_p z_p $
adding experimental error $ \epsilon = y - \eta $:
$ y = t_1 z_1 + t_2 z_2 + ... + t_p z_p + \epsilon $
model of this form is linear in the parameters
Algorithm
Formulate a problem with n observed responses, p parameters...
this yields n equations of the form
y_1 = t_1 z_{11} + t_2 z_{21} + ...
y_2 = t_1 z_{21} + t_2 z_{22} + ...
etc...
This can be written in matrix form:
$ \mathbf{y} = \mathbf{Z t} + \boldsymbol{\epsilon} $
and the dimensions of each matrix are:
- y = n x 1
- Z = n x p
- t = p x 1
- epsilon = n x 1
the sum of squares function is given by:
$ S(\mathbf{t}) = \sum_{u=1}^{n} \left( y_u - t_1 z_{1u} - t_2 z_{2u} - ... - t_p z_{pu} \right)^2 $
or,
$ S(t) = ( y - Zt )^{\prime} ( y - Zt ) $
this can be rewritten as:
$ \mathbf{ Z^{\prime} Z t = Z^{\prime} y } $
Rank of Z
If there are relationships between the different input parameters (z's), then the matrix Z can become singular
e.g. if there is a relationship z2 = c z1, then you can only estimate the linear combination z1 + c z2
reason: when z2 = c z1, changes in z1 can't be distinguished from changes in z2
Z (an n x p matrix) is said to be full rank p if there are no linear relationships of the form:
a_1 z_1 + a_2 z_2 + ... + a_p z_p l= 0
if there are q > 0 independent linear relationships, then Z has rank p - q
Analysis of Variance: 1 regressor
Assume simple model $ y = \beta + \epsilon $
This states that y is varying about an unknown mean $ \beta $
Suppose we have 3 observations of y, $ \mathbf{y} = (4, 1, 1)' $
Then the model can be written as $ y = z_1 t + \epsilon $
and $ z_1 = (1, 1, 1) ' $
and $ t = \beta $
so that
[ 4 ] [ 1 ] [ \epsilon_1 ] [ 1 ] = [ 1 ] t + [ \epsilon_2 ] [ 1 ] [ 1 ] [ \epsilon_3 ]
Supposing the linear model posited a value of one of the regressors t, e.g. $ t_0 = 0.5 $
Then you could check the null hypothesis, e.g. $ H_0 : t = t_0 = 0.5 $
If true, the mean observation vector given by $ \eta_0 = z_1 t_0 $
or,
[ 0.5 ] [ 1 ] [ 0.5 ] = [ 1 ] 0.5 [ 0.5 ] [ 1 ]
and the appropriate "observation breakdown" (whatever that means?) is:
$ y - \eta_0 = ( \hat{y} - \eta_0 ) + ( y - \hat{y} ) $
Associated with this observation breakdown is an analysis of variance table:
| Source | Degrees of freedom (df) | Sum of squares (square of length), SS | Mean square, MS | Expected value of mean square, E(MS) |
|---|---|---|---|---|
| Model | 1 | $ \vert \hat{y} - \eta_0 \vert^2 = ( \hat{t} - t_0 )^2 \sum z_1^2 $ | 6.75 | $ \sigma^2 + ( t - t_0 )^2 \sum z_1^2 $ |
| Residual | 2 | $ \vert y - \hat{y} \vert^2 = \sum ( y - \hat{t} z_1 )^2 $ | 3.00 | $ \sigma^2 $ |
| Total | 3 | $ \vert y - \eta_0 \vert^2 = \sum ( y - \eta_0 )^2 = 12.75 $ |
sum of squares: squared lengths of vectors
degrees of freedom: number of dimensions in which vector can move (geometric interpretation)
the model $ y = z_1 t + \epsilon $ says whatever the data is, the systematic part $ \hat{y} - \eta_0 = ( \hat{t} - t_0) z_1 $ of $ y - \eta_0 $ must lie in the direction of $ z_1 $, which gives $ \hat{y} - \eta_0 $ only one degree of freedom.
Whatever the data, the residual vector must be perpendicular to $ z_1 $ (why?), and so it can move in 2 directions and has 2 degrees of freedom
Now, looking at the null hypothesis:
the component $ \vert \hat{y} - \eta_0 \vert^2 = ( \hat{t} - t_0 )^2 \sum z^2 $ is a measure of discrepancy between POSTULATED model $ \eta_0 = z_1 t_0 $ and ESTIMATED model $ \hat{y} = z_1 \hat{t} $
Making "standard assumptions" (earlier), expected value of sum of squares, assuming model is true, is $ ( t - t_0 )^2 \sum z_1^2 + \sigma^2 $
For the residual component it is $ 2 \sigma^2 $ (or, in general, $ \nu_2 \sigma^2 $, where $ \nu_2 $ is number of degrees of freedom of residuals)
Thus a measure of discrepancy from the null hypothesis $ t = t_0 $ is $ F = \frac{ \vert \hat{y} - \eta_0 \vert^2 / 1 }{ \vert y - \hat{y} \vert^2 / 2 } $
if the null hypothesis were true, then the top and bottom would both estimate the same $ \sigma^2 $
So if F is different from 1, that indicates departure from null hypothesis
The MORE F differs from 1, the more doubtful the null hypothesis becomes
Least squares: 2 regressors
Previous model, $ y = \beta + \epsilon $, said y was represented with a mean $ t $ plus an error.
Instead, suppose that there are systematic deviations from the mean, associated with an external variable (e.g. humidity in the lab).
Now equation is for straight line: $ y = \beta_0 + \beta_1 x + \epsilon $
or, $ y = z_1 t_1 + z_2 t_2 + \epsilon $
So now the revised least-squares model is: $ \eta = z_1 t_1 + z_2 t_2 $
$ \eta = E(y) $ - i.e. $ \eta $ is in the plane defined by linear combinations of vectors $ z_1, z_2 $
because $ z_1^{\prime} z_2 = \sum z_1 z_2 \neq 0 $, these two vectors are NOT at right angles
the least-squares values $ \hat{t_1}, \hat{t_2} $ produce a vector $ \hat{\hat{y}} = z_1 \hat{t_1} + z_2 \hat{t_2} $
these least-squares values make the squared length $ \sum ( y - \hat{\hat{y}} )^2 = \vert y - \hat{\hat{y}} \vert^2 $ of the residual vector as small as possible
The normal equations express fact that residual vector must be perpendicular to both $ z_1 $ and $ z_2 $:
$ \begin{align} z_1^{\prime} ( y - \hat{\hat{y}} ) &=& 0 \\ z_2^{\prime} ( y - \hat{\hat{y}} ) &=& 0 \end{align} $
also written as:
$ \begin{align} \sum z_1 ( y - \hat{t_1} z_1 - \hat{t_2} z_2 ) &=& 0 \\ \sum z_2 ( y - \hat{t_1} z_1 - \hat{t_2} z_2 ) &=& 0 \end{align} $
also written (in matrix form) as:
$ \mathbf{Z^{\prime}} ( \mathbf{y - Z \hat{t} } ) = 0 $
Now suppose the null hypothesis was investigated for $ t_1 = t_{10} = 0.5 $ and $ t_2 = t_{20} = 1.0 $
Then the mean observation vector $ \eta_0 $ is represented as $ \eta_0 = t_{10} z_1 + t_{20} z_2 $
| Source | Degrees of freedom | SS | MS | F |
|---|---|---|---|---|
| Model $ z_1 $ and $ z_2 $ | 2 | $ \vert \hat{\hat{y}} - \eta_0 \vert^2 = \sum \left[ \left( t_1 - t_{01} \right) z_1 + \left( t_2 - t_{02} \right) z_2 \right]^2 = 6.69 $ | 3.345 | 2.23 |
| Residual | 1 | $ \vert y - \hat{\hat{y}} \vert^2 = \sum \left( y - \hat{t_1} z_1 - \hat{ t_2 } z_2 \right)^2 = 1.50 $ | 1.50 | |
| Total | 3 | $ \vert y - \eta_0 \vert^2 = \sum \left( y - \eta_0 \right)^2 = 8.19 $ |
$ y - \eta_0 = \left( \hat{\hat{y}} - \eta_0 \right) + \left( y - \hat{\hat{y}} \right) $
and so
$ F_0 = \frac{ \vert \hat{\hat{y}} - \eta_0 \vert / 2 }{ \vert y - \hat{\hat{y}} \vert^2 / 1 } = 2.23 $
Orthogonalizing second regressor
In the above example, $ z_1 $ and $ z_2 $ are not orthogonal
One can find the vectors $ z_1 $ and $ z_{2 \cdot 1} $ that are orthogonal
To do this, use least squares property that residual vector is orthogonal to space in which the predictor variables lie
Regard $ z_2 $ as "response" vector and $ z_1 $ as predictor variable
You then obtain $ \hat{z_2} = 0.2 z_1 $ (how?)
so the residual vector is $ z_{2 \cdot 1} = z_2 - \hat{z_2} = z_2 - 0.2 z_1 $
now the model can be rewritten as $ \eta = \left( t_1 + 0.2 t_2 \right) z_1 + t_2 \left( z_2 - 0.2 z_1 \right) = t z_1 + t_2 z_{2 \cdot 1} $
This gives three least-squares equations:
1. $ \hat{y} = 2 z_1 $ 2. $ \hat{y} = 1.5 z_1 + 2.5 z_2 $ 3. $ \hat{y} = 2.0 z_1 + 2.5 z_{2 \cdot 1} $
The analysis of variance becomes:
| Source | df | SS |
|---|---|---|
| Response function with $ z_1 $ only | 1 | $ \vert \hat{y} - \eta_0 |vert^2 = \left( \hat{t} - t_0 \right)^2 \sum z_1^2 = 12.0 $ |
| Extra due to $ z_2 $ (given $ z_1 $) | 1 | $ \vert \hat{\hat{y}} - \hat{y} \vert^2 = \hat{t}_2^2 \sum z_{2 \cdot 1}^2 = 4.5 $ |
| Residual | 1 | $ \vert y - \hat{\hat{y}} \vert^2 = \sum \left( y - \hat{\hat{y}} \right)^2 = 1.5 $ |
| Total | 3 | $ \vert y - \eta_0 \vert^2 = \sum \left( y - \eta_0 \right)^2 = 18.0 $ |
Generalization to p regressors
With n observations and p parameters:
n relations implicit in response function can be written
$ \boldsymbol{\eta} = \mathbf{Z t} $
Assuming Z is full rank, and letting $ \hat{\mathbf{t}} $ be the vector of estimates given by normal equations
$ \left( \mathbf{ y - \hat{y} } \right)^{\prime} \mathbf{Z} = \left( y - Z \hat{t} \right)^{\prime} Z = 0 $
Sum of squares function is $ S(t) = (y - \eta)^{\prime} (y - \eta) = (y - \hat{y})^{\prime} (y - \hat{y}) + ( \hat{y} - \eta )^{\prime} (\hat{y} - \eta) $
because cross-product is zero from the normal equations
$ S(t) = S(\hat{t}) + (\hat{t} - t)^{\prime} \mathbf{Z^{\prime} Z} ( \hat{t} - t ) $
Furthermore, because $ \mathbf{Z^{\prime} Z} $ is positive definite, $ S(t) $ minimized when $ t = \hat{t} $
So the solution to the normal equations producing the least squares estimate is the one where $ t = \hat{t} $:
$ \hat{t} = ( \mathbf{Z^{\prime} Z} )^{-1} \mathbf{Z^{\prime} y} $
| Source | df | SS |
|---|---|---|
| Response function | p | $ \vert \hat{y} - \eta \vert^2 = (\hat{t} - t)^{\prime} \mathbf{Z^{\prime} Z} ( \hat{t} - t ) $ |
| Residual | n-p | $ \vert y - \hat{y} \vert^2 = \sum ( y - \hat{y} )^2 $ |
| Total | n | $ \vert y - \eta \vert^2 = \sum ( y - \eta )^2 $ |
Bias in Least-Squares Estimators if Inadequate Model
Say data was being fit with a model $ y = Z_1 t_1 + \epsilon $,
but the true model that should have been used is $ y = Z_1 t_1 + Z_2 t_2 + \epsilon $
$ t_1 $ would be estimated by $ \hat{t_1} = (\mathbf{ Z_1^{\prime} Z_1 } )^{-1} \mathbf{ Z_1^{\prime} y } $
but using true model, $ \begin{array}{rcl} E( \hat{t_1} ) &=& ( \mathbf{Z_1^{\prime} Z_1} )^{-1} \mathbf{Z_1^{\prime}} E(\mathbf{y}) \\ &=& ( \mathbf{ Z_1^{\prime} Z_1 } )^{-1} \mathbf{Z_1^{\prime}} (\mathbf{Z_1 t_1} + \mathbf{Z_2 t_2} ) \\ &=& \mathbf{t_1 + A t_2} \end{array} $