Multivariate Statistical Modelling Based on Generalized Linear Models
From charlesreid1
Contents
Chapter 1: Book Outline and Notes
Chapter 2: review of univariate generalized linear models and extensions
Chapter 3: models for multicategorical responses (i.e. multiple, unordered responses)
Chapter 4: selecting variables for models, variable reduction procedures, checking models, goodness-of-fit, residual analysis (outliers or consistent trend?)
Chapter 5: Semi- and non-parametric approaches
Chapter 6: Fixed-parameter models for time series (extends Ch. 2 and Ch. 3)
Chapter 7: Random effects models for non-normal data
Chapter 8: State space models for analyzing non-normal time series; relate time series observations y_t to unobserved states, like trend and seasonal components
Chapter 9: Survival models; determination of factors that determine survival/transition
Chapter 2: Univariate Generalized Linear Models
Cross-sectional regression analysis: univariate variable of primary interest (response variable)
Explained by a vector Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle x = (x_1, x_2, \dots x_m)}
Data consist of observations on Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle (y,x)} :
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle (y_i , x_i ), i = 1, \dots, n }
Definition of Univariate Generalized Linear Models
classical linear model for ungrouped normal responses and deterministic covariates is:
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle y_i = z_i^{\prime} \beta + \epsilon_i }
where:
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle z_i} = design vector, function of covariate vector Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle x_i}
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \beta} = vector of unknown parameters
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \epsilon_i} = errors, normally distributed and independent, Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \epsilon_i \sim N(0, \sigma^2)}
The observations Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle y_i} are independent and normally distributed,
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle y_i \sim N(\mu_i, \sigma^2), i=1, \dots, n }
A specific generalized linear model is fully characterized by three components:
- type of exponential family
- response or link function
- design vector
Example:
Exponential family
- important members: normal, binomial, poisson, gamma, inverse Gaussian distributions
Models for Continuous Responses
Normal distribution
Gamma distribution
Inverse Gaussian distribution
Models for Binary and Binomial Responses
Linear probability model
Probit model
Logit model
Complementary log-log model
Models for Counted Data
Log-linear Poisson model
Linear Poisson model
Likelihood Inference
Regression analysis with generalized linear models is based on likelihoods
This section contains inferential tools for:
- parameter estimation
- hypothesis testing
- good-ness-of-fit tests
- more detailed material on model choice/checking: see Chapter 4
Assumes that model is completely and correctly specified
Maximum likelihood estimator (MLE): MLE of unknown parameter vector obtained by maximizing the likelihood
Goodness of fit Statistics
two measures of adequacy of model (goodness of fit) are:
Pearson statistic Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle \chi ^{2}=\sum _{i=1}^{g}{\frac {\left(y_{i}-{\hat {\mu }}_{i}\right)^{2}}{v({\hat {\mu }}_{i})}}}
Deviance Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle D = - 2 \phi \sum_{i=1}^{g} \left[ l_i ( \hat{\mu}_i ) - l_i (y_i) \right]}
where
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \hat{\mu}_i} = estimated mean function
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle v(\hat{\mu}_i)} = estimated variance function
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle l_i(y_i)} = individual log-likelihood
References
Generalized linear models:
- McCullagh and Nelder 1989 - standard source of information about generalized linear models
- Santner and Duffy 1989 - consider cross-classified data and univariate discrete data