
|
|
(1) |
Xi is a ni × p design matrix for the p-vector of fixed effects β, and Zi is a ni × q design matrix associated to the q-vector of random effects ui which represents the subject specific regression coefficients. The errors εi are assumed to be normally distributed with mean zero and covariance matrix σ2Ini, and are assumed to be independent from the vector of random effects ui.
In an homogeneous mixed model [1], ui is normally distributed with mean μ and covariance matrix D i.e.
|
|
(2) |
In the heterogeneous mixed model [2–4], ui is assumed to follow a mixture of G multivariate Gaussians with different means (μg)g=1,G and a common covariance matrix D i.e.
|
|
(3) |
Each component g of the mixture has a probability πg and the (πg)g=1,G verify the following conditions:
|
|
(4) |
In this work, we propose a slightly more general formulation of the model described in (1) in which the effect of some covariates may depend on the components of mixture and some of the random effects may have a common mean whatever the component of mixture. Thus, the Xi design matrix is split in X1i associated with the vector β of fixed effects which are common to all the components and X2i associated with the vectors γg of fixed effects which are specific to the components. The Zi design matrix is also splitted in Z1i associated with the vector vi of random effects following a single Gaussian distribution and Z2i associated with the vector ui of random effects following a mixture of Gaussian distributions. The model is then written as:
|
|
(5) |
where vi ~ N(0, Dv) and ; given the component g, the conditional distribution of the vector
is
with
.
|
|
(6) |
Given wig, yi follows a linear mixed model, and the density f(yi|wig = 1) denoted by (
ig is the multivariate Gaussian density with mean Eig and covariance matrix Vi given by:
|
|
(7) |
Let now θ be the vector of the m parameters of the model. θ contains ψ with and π the vector of the G &S722; 1 first component probabilities (πg)g=1,G&S722;1. Note that πg is entirely determined by π as
. Vec(D) represents the vector of the upper triangular elements of D. The estimates of θ are obtained as the vector
that maximizes the observed log-likelihood:
|
|
(8) |
|
|
(9) |
where, if necessary, α is modified to ensure that the log-likelihood is improved at each iteration.
To ensure that the covariance matrix D is positive, we maximize the log-likelihood on the non zero elements of U, the Cholesky factor of D (i.e. U′U = D) [7]. Furthermore, to deal with the constraints on π (4) we use the transformed parameters (γg)g=1,G&S722;1 with:
|
|
(10) |
Standard errors of the elements of D and (πg)g=1,G&S722;1 are computed by the Δ-method [11] while standard errors of the other parameters are directly computed using the inverse of the observed Hessian matrix.
The convergence is reached when the three following convergence criteria are satisfied: , |L(k)&S722;L(k&S722;1)| ≤ εb and g(θ(k))′H(k)&S722;1g(θ(k)) ≤ εd. The default values are εa = 10&S722;5, εb = 10&S722;5 and εd = 10&S722;8.
As the log-likelihood of a mixture model may have several maxima [8], we use a grid of initial values to find the global maximum. The multimodality of the log-likelihood in mixture models has been often discussed and some authors proposed different strategies to choose the set of initial values [12]. However, none of them seems to be optimal in a general way. We have observed, in our experience, that the results were mainly sensitive to initial values of (πg)g=1,G&S722;1 and (μg)g=1,G and less sensitive to the other parameters (Vec(U), β and σ) for which estimates of the homogeneous mixed models were good initial values.
A mixture model is estimated with a fixed number of components G, otherwise the number of parameters in the model is unknown. To choose the right number of components, one has to estimate models with different values for G and select the best model according to a test or a criterion. Some works favor a bootstrap approach to approximate the asymptotic distribution of the likelihood ratio test between models with different number of components [13] but this approach is very heavy in particular for mixture models with random effects. Criteria such as Akaike’s Information Criterion (AIC) [14] or Bayesian Information Criterion (BIC) [15] are often preferred. We use these selection criteria to select the optimal number of components.
= (
′,
′)′, these probabilities are obtained by the Bayes theorem [2–4] as:
|
|
(11) |
We then assign to each subject i the component to which he has the highest probability (πig)g=1,G to belong.


