Paper 4 Estimation of random-effects model for longitudinal data with nonignorable missingness using Gibbs sampling

by Prajamitra Bhuyan1(2019)

4.1 Abstract:

Provide an alternative modelling of ‘correlated random-effect model’ using latent variable and algorithm based on Gibbs sampling. In order to solve the longitudinal study involve nonignorable missingness.

因为处理nonignorable missingness要用上‘correlated random-effect model’或者 ‘shared random-effect model’.

4.2 Introduction:

Proportion of missing data can be substantial. Consider the missingness mechanism is nonignorable. Jointly model the observed response and indicators of missingness. Classified into either pattern mixture models or selectin models. Little(1995) provided a detail overview.

  • Siddiqui and Ali(1998) random-effects pattern-mixture models. Averaging estimates from different subsets depending on missing data-pattern.
  • Daniels and Hogan(2000), reparameterization of pattern mixture model to consider wide range of missing mechanism.
  • Diggle and Kenward(1994), proposed reparameterization of the pattern mixture model with logistic regression.
  • Baker (1995) repeated binary data with nonignorable and non-monotone missingness.
  • Troxel et al.(1998) Extend Diggle for non-monotone and nonignoranble but with computationally challenging.
  • Rotnizky et al.(1998) inverse probability weighted estimating equations.
  • Minini and Chavance(2004) log-linear model and provided a sensitiviy analysis for the longitudinal binary data with nonignorable missings.

Survival analysis

  • Wu and Carrol(1998): Shared random-effects model, as alternative to the selection models.
  • De and tu(1994), Schluchter(1994) extensions of SRE models.
  • Have et.al (1998) and Pulkstenis et al.(1998) adopted the SRE model for binary longitudinal data with the informative dropouts.
  • Tsonaka et al.(2009) considered semi-parametric shared parameter model for the modelling of the response variable with non-monotone and nonignorable missingness.

SRE models, the selection model and the response model have exactly the same random component.Latent factors affecting the missingness could be different from those affecting the response, however it is correlated due to common risk factor.

  • Lin et.al (2010) interesting generalization of SRE model by correlated random effects model. Main difficulties of random effects approach is computational challenge. Expressed the likelihood as a ratio of two integrals and approximated the numerator and denominator using Laplace formula.

  • Pinheiro and Bates(1995) Gauss-Hermite quadrature for numerical integration in loglikelihood function.

  • Breslow and Clayton (1993) Laplace approximation.

This paper propose alternative modelling of observed response and indicators of missingness based on correlated latent variable. Then develop regression models. Gibbs sampler is used to implement the method.

  • Section2, proposed method
  • Section3, simulation studies
  • section4, outlines of possible future work and some conclusion.

4.3 Proposed model

i denote the response \[ Y_{i}(t)=\sum_{j=1}^{J} \beta_{j}(t) X_{j i}(t)+\sum_{j^{\prime}=1}^{J^{\prime}} \gamma_{j^{\prime}} Z_{j^{\prime} i}(t)+\sum_{l=1}^{L} u_{l i} \tilde{Z}_{l i}(t)+e_{i}(t) \] subject specific random effects \(\mathbf{u}_{i}=\left(u_{1 i}, \dots, u_{L i}\right)\sim N_{L}\left(\mathbf{0}, \Sigma_{u}\right)\),residual \(e_i(t)\sim N\left(0, \sigma_{e}^{2}\right)\). Special case for generalized varying coefficient model for longitudinal data. (Sentrk et al.2013) Ignorable missing data mechanism:Missing at random (MAR) (Little 1995)

If using MAR to nonignorable missing values, then the model will be biased.
Definition 4.1 Define \(U_i(t)\) as \[ U_{i}(t)=\left\{\begin{array}{ll}{1,} & {\text { if } Y_{i}(t) \text { is observed }} \\ {0,} & {\text { if } Y_{i}(t) \text { is missing }}\end{array}\right. \] And rewrite \(Y_i(t)\) as \[ Y_{i}(t)=\left\{\begin{array}{ll}{Y_{i}^{*}(t),} & {\text { if } U_{i}(t)=1} \\ {\text { missing, }} & {\text { if } U_{i}(t)=0}\end{array}\right. \]

in this case \(Y_i^*(t)\) is a latent variable. Under such situation, we can rewrite the model as \[ Y_{i}^{*}(t)=\sum_{j=1}^{J} \beta_{j}(t) X_{j i}(t)+\sum_{j^{\prime}=1}^{J^{\prime}} \gamma_{j^{\prime}} Z_{j^{\prime} i}(t)+\sum_{l=1}^{L} u_{l i} \tilde{Z}_{l i}(t)+e_{i}(t) \] In addition, consider the latent variable \(U_i^*(t)\) with \[ U_{i}(t)=\left\{\begin{array}{ll}{1,} & {\text { if } U_{i}^{*}(t)>0} \\ {0,} & {\text { if } U_{i}^{*}(t) \leq 0}\end{array}\right. \]

Then we consider modelling missing data mechanism with covariate as \[ U_{i}^{*}(t)=\sum_{k=1}^{K} \theta_{k}(t) W_{k i}(t)+\sum_{k^{\prime}=1}^{K^{\prime}} \delta_{k^{\prime}} S_{k^{\prime} i}(t)+\sum_{l=1}^{L} v_{l i} \tilde{Z}_{l i}(t)+\epsilon_{i}(t) \] \(v_{li}\) is the random effect, write as \(\mathbf{v}_{i}=\left(v_{1 i}, \ldots, v_{L i}\right)\sim N_{L}\left(\mathbf{0}, \Sigma_{v}\right)\).

In order to incorporate the possible correlation between the response variable \(Y_i(t)\) and the missing indicator \(U_i(t)\). We consider \(\boldsymbol u_i\) and \(\boldsymbol v_i\) are correlated random vectors with distribution \(N(\boldsymbol 0,\Sigma)\) with \(\Sigma=\left( \begin{array}{cc}{\Sigma_{u}} & {\Sigma_{u v}} \\ {\Sigma_{u v}} & {\Sigma_{v}}\end{array}\right)\).

Use the usual model for multivariate longitudinal data once we know the latent variable at each step. In this step, use Bayesian estimation method for simltaneous estimation of the parameters by Gibbs sampling.

4.3.1 Modelling time-varying coefficients

Since \(\beta_j(t)\) and \(\theta_k(t)\) is not known, consider semi-parametric approach using Legendre polynomials (LP) basis functions.

(Marie and Sen 1985; Meyer 2000; Cui and Zhu 2006;Bhuyan et al.2019)

\[ P_{r}(x)=\sum_{l=0}^{L}(-1)^{l} \frac{(2 r-2 l) !}{2^{r} l !(r-l) !(r-2 l) !} x^{r-2 l} \]

Why use this? not B-spline or nature spline

Defined over \([-1,1]\) , orthogonal to each over, which is \(\int_{-1}^{1} P_{r}(x) P_{s}(x) d x=0\). Example: \[ P_{0}(x)=1, \quad P_{1}(x)=x, \quad P_{2}(x)=\frac{1}{2}\left(3 x^{2}-1\right), \quad P_{3}(x)=\frac{1}{2}\left(5 x^{3}-3 x\right) \] .

Using the adjusted time points. \(t^{\prime}=-1+2\left(\frac{t-t_{\min }}{t_{\max }-t_{\min }}\right)\) > seems a triditional re-scale method.

\(P^{(r)}\left(t^{\prime}\right)=\left[P_{0}\left(t^{\prime}\right), P_{1}\left(t^{\prime}\right), \ldots, P_{r}\left(t^{\prime}\right)\right]^{T}\)

then for the time-varying parameter \(\beta_{j}\left(t^{\prime}\right)\) and \(\theta_k(t')\) are linear combination of the basis functions: \[ \beta_{j}(t)=\mathbf{a}_{j}^{T} P^{\left(r_{1}\right)}(t) ; \quad \theta_{k}(t)=\mathbf{b}_{k}^{T} P^{\left(r_{2}\right)}(t) \]

\(r_1\) and \(r_2\) 和jmcm里面的triple(p,q,d)类似

低次方的多项式一般而言就足够了,可以用AIC/BIC选。 Example: \[ J=2, J^{\prime}=0, L=1,r_{1}=r_{2}=1, X_{1 i}(t)=1, \tilde{Z}_{1 i}(t)=1,X_{2 i}(t) \equiv X_{2 i} \] Model: \[ \begin{array}{l}{Y_{i}^{*}(t)=\alpha_{10}+\alpha_{11} t+\alpha_{20} X_{2 i}+\alpha_{21} t X_{2 i}+u_{1 i}+e_{i}(t)} \\ {U_{i}^{*}(t)=\lambda_{10}+\lambda_{11} t+\lambda_{20} X_{2 i}+\lambda_{21} t X_{2 i}+v_{1 i}+\epsilon_{i}(t)}\end{array} \]

4.3.2 Bayesian estimation using Gibbs sampler

Notation: \[ Y^{*}=\left(Y_{11}^{*}(t), \ldots, Y_{n m}^{*}(t)\right), U^{*}=\left(U_{11}^{*}(t), \ldots, U_{n m}^{*}(t)\right),\\ Y=\left(Y_{11}(t), \ldots, Y_{n m}(t)\right) U=\left(U_{11}(t), \ldots, U_{n m}(t)\right) \] The joint posterior density for the latent variables

\[ \begin{array}{l}{\pi\left(\Theta, Y^{*}, U^{*} | Y, U\right) \propto \pi\left(\mathbf{a}, \mathbf{b}, \gamma, \delta, \sigma_{e}^{2}, \Sigma\right) \times \prod_{i=1}^{n} \int \prod_{t=1}^{m} f\left(Y_{i}^{*}(t), U_{i}^{*}(t) | u_{i}, v_{i}\right)} \\ {\quad \times\left\{1\left(U_{i}^{*}(t)>0\right) 1\left(U_{i}(t)=1\right)+1\left(U_{i}^{*}(t) \leq 0\right) 1\left(U_{i}(t)=0\right)\right\} g\left(u_{i}, v_{i}\right) d u_{i} d v_{i}}\end{array} \]

Prior:

  • Non-informative priors for \(\left(\mathbf{a}, \boldsymbol{\gamma}, \sigma_{e}^{2}\right)\) and \((\mathbf{b}, \boldsymbol{\delta})\).
  • Maximal data information prior for \(\Sigma\)

Then we have

\[ \pi\left(\mathbf{a}, \boldsymbol{\gamma}, \sigma_{e}^{2}\right) \propto \frac{1}{\sigma_{e}^{2}}\\ \pi(\mathbf{b}, \delta) \propto 1\\ \pi(\Sigma) \propto \frac{1}{|\Sigma|} \]

Maximal data information prior?

Then the full conditionals for \((\mathbf{a}, \mathbf{b}, \gamma, \boldsymbol{\delta}), \sigma_{e}^{2}\) and \(\Sigma\) are Normal, inverse gamma and inverse-Wishart, respectively.

Then the latent variable \(Y^*_i(t)\) and \(U^*_i(t)\) are sampled from the conditional densities:

\[ \left\{\begin{array}{ll}{Y_{i}(t) \text { with probability } 1,} & {\text { if } U_{i}(t)=1} \\ {N\left(\sum_{j=1}^{J} \beta_{j}(t) X_{j i}(t)+\sum_{j^{\prime}=1}^{J^{\prime}} \gamma_{j^{\prime}} Z_{j i}(t)+\sum_{l=1}^{L} u_{l i} \tilde{z}_{l i}(t), \sigma_{e}^{2}\right),} & {\text { if } U_{i}(t)=0}\end{array}\right. \]

\[ \left\{\begin{array}{l}{N\left(\sum_{k=1}^{K} \theta_{k}(t) W_{k i}(t)+\sum_{k^{\prime}=1}^{K^{\prime}} \delta_{k^{\prime}} S_{k^{\prime} i}(t)+\sum_{l=1}^{L} v_{l i} \tilde{Z}_{l i}(t), 1\right)} \text{left truncated at 0,} & \text{if} U_i(t)=1,\\ {N\left(\sum_{k=1}^{K} \theta_{k}(t) W_{k i}(t)+\sum_{k^{\prime}=1}^{K^{\prime}} \delta_{k^{\prime}} S_{k^{\prime} i}(t)+\sum_{l=1}^{L} v_{l i} \tilde{Z}_{l i}(t), 1\right)}\text{right truncated at 0,}& \text{if }U_i(t)=0\end{array}\right. \]