--- title: "Mixed-Effects Beta Interval Regression with brsmm" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Mixed-Effects Beta Interval Regression with brsmm} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 10, fig.height = 5, fig.align = "center", warning = FALSE, message = FALSE, digits = 4 ) kbl10 <- function(x, n = 10, digits = 4, ...) { knitr::kable(utils::head(as.data.frame(x), n), digits = digits, align = "c", ...) } ``` ## Overview `brsmm()` extends `brs()` to clustered data by adding Gaussian random effects in the mean submodel while preserving the interval-censored beta likelihood for scale-derived outcomes. This vignette covers: 1. full model mathematics; 2. estimation by marginal maximum likelihood (Laplace approximation); 3. practical use of all current `brsmm` methods; 4. inferential and validation workflows, including parameter recovery. ```{r library} library(betaregscale) ``` ## Mathematical model Assume observations $i = 1, \dots, n_j$ within groups $j = 1, \dots, G$, with group-specific random-effects vector $\mathbf{b}_j \in \mathbb{R}^{q_b}$. ### Linear predictors $$ \eta_{\mu,ij} = x_{ij}^\top \beta + w_{ij}^\top \mathbf{b}_j, \qquad \eta_{\phi,ij} = z_{ij}^\top \gamma $$ $$ \mu_{ij} = g^{-1}(\eta_{\mu,ij}), \qquad \phi_{ij} = h^{-1}(\eta_{\phi,ij}) $$ with $g(\cdot)$ and $h(\cdot)$ chosen by `link` and `link_phi`. The random-effects design row $w_{ij}$ is defined by `random = ~ terms | group`. ### Beta parameterization For each $(\mu_{ij},\phi_{ij})$, `repar` maps to beta shape parameters $(a_{ij},b_{ij})$ via `brs_repar()`. ### Conditional contribution by censoring type Each observation contributes: $$ L_{ij}(b_j;\theta)= \begin{cases} f(y_{ij}; a_{ij}, b_{ij}), & \delta_{ij}=0\\ F(u_{ij}; a_{ij}, b_{ij}), & \delta_{ij}=1\\ 1 - F(l_{ij}; a_{ij}, b_{ij}), & \delta_{ij}=2\\ F(u_{ij}; a_{ij}, b_{ij}) - F(l_{ij}; a_{ij}, b_{ij}), & \delta_{ij}=3 \end{cases} $$ where $l_{ij},u_{ij}$ are interval endpoints on $(0,1)$, $f(\cdot)$ is beta density, and $F(\cdot)$ is beta CDF. ### Random-effects distribution $$ \mathbf{b}_j \sim \mathcal{N}(\mathbf{0}, D), $$ where $D$ is a symmetric positive-definite covariance matrix. Internally, `brsmm()` optimizes a packed lower-Cholesky parameterization $D = LL^\top$ (diagonal entries on log-scale for positivity). ### Group marginal likelihood $$ L_j(\theta)=\int_{\mathbb{R}^{q_b}} \prod_{i=1}^{n_j} L_{ij}(b_j;\theta)\, \varphi_{q_b}(\mathbf{b}_j;\mathbf{0},D)\,d\mathbf{b}_j $$ $$ \ell(\theta)=\sum_{j=1}^G \log L_j(\theta) $$ ### Laplace approximation used by `brsmm()` Define $$ Q_j(\mathbf{b})= \sum_{i=1}^{n_j}\log L_{ij}(\mathbf{b};\theta)+ \log\varphi_{q_b}(\mathbf{b};\mathbf{0},D) $$ and $\hat{\mathbf{b}}_j=\arg\max_{\mathbf{b}} Q_j(\mathbf{b})$, with curvature $$ H_j = -\nabla^2 Q_j(\hat{\mathbf{b}}_j). $$ Then $$ \log L_j(\theta) \approx Q_j(\hat{\mathbf{b}}_j) + \frac{q_b}{2}\log(2\pi) - \frac{1}{2}\log|H_j|. $$ `brsmm()` maximizes the approximated $\ell(\theta)$ with `stats::optim()`, and computes group-level posterior modes $\hat{\mathbf{b}}_j$. For $q_b = 1$, this reduces to the scalar random-intercept formula. ## Simulating clustered scale data The next helper simulates data from a known mixed model to illustrate fitting, inference, and recovery checks. ```{r simulate-data} sim_brsmm_data <- function(seed = 3501L, g = 24L, ni = 12L, beta = c(0.20, 0.65), gamma = c(-0.15), sigma_b = 0.55) { set.seed(seed) id <- factor(rep(seq_len(g), each = ni)) n <- length(id) x1 <- rnorm(n) b_true <- rnorm(g, mean = 0, sd = sigma_b) eta_mu <- beta[1] + beta[2] * x1 + b_true[as.integer(id)] eta_phi <- rep(gamma[1], n) mu <- plogis(eta_mu) phi <- plogis(eta_phi) shp <- brs_repar(mu = mu, phi = phi, repar = 2) y <- round(stats::rbeta(n, shp$shape1, shp$shape2) * 100) list( data = data.frame(y = y, x1 = x1, id = id), truth = list(beta = beta, gamma = gamma, sigma_b = sigma_b, b = b_true) ) } sim <- sim_brsmm_data( g = 5, ni = 200, beta = c(0.20, 0.65), gamma = c(-0.15), sigma_b = 0.55 ) str(sim$data) ``` ## Fitting `brsmm()` ```{r fit-brsmm} fit_mm <- brsmm( y ~ x1, random = ~ 1 | id, data = sim$data, repar = 2, int_method = "laplace", method = "BFGS", control = list(maxit = 1000) ) summary(fit_mm) ``` ## Random intercept + slope example The model below includes a random intercept and random slope for `x1`: ```{r fit-brsmm-random-slope} fit_mm_rs <- brsmm( y ~ x1, random = ~ 1 + x1 | id, data = sim$data, repar = 2, int_method = "laplace", method = "BFGS", control = list(maxit = 1200) ) summary(fit_mm_rs) ``` Covariance structure of random effects: ```{r random-covariance} kbl10(fit_mm_rs$random$D) kbl10( data.frame(term = names(fit_mm_rs$random$sd_b), sd = as.numeric(fit_mm_rs$random$sd_b)), digits = 4 ) kbl10(head(ranef(fit_mm_rs), 10)) ``` ## Additional studies of random effects (numerical and visual) Following practices from established mixed-models packages, the package now allows for a dedicated study of the random effects focusing on: * $D$ structure and correlation; * empirical distribution of modes by group; * empirical shrinkage intensity; * specific visual diagnostics for the random components. ```{r ranef-study} re_study <- brsmm_re_study(fit_mm_rs) print(re_study) kbl10(re_study$summary) kbl10(re_study$D) kbl10(re_study$Corr) ``` Suggested visualizations for random effects: ```{r ranef-visuals} if (requireNamespace("ggplot2", quietly = TRUE)) { autoplot.brsmm(fit_mm_rs, type = "ranef_caterpillar") autoplot.brsmm(fit_mm_rs, type = "ranef_density") autoplot.brsmm(fit_mm_rs, type = "ranef_pairs") autoplot.brsmm(fit_mm_rs, type = "ranef_qq") } ``` ## Core methods ### Coefficients and random effects `coef(fit_mm, model = "random")` returns packed random-effect covariance parameters on the optimizer scale (lower-Cholesky, with a log-diagonal). For random-intercept models, this simplifies to $\log \sigma_b$. ```{r methods-coef} kbl10( data.frame( parameter = names(coef(fit_mm, model = "full")), estimate = as.numeric(coef(fit_mm, model = "full")) ), digits = 4 ) kbl10( data.frame( log_sigma_b = as.numeric(coef(fit_mm, model = "random")), sigma_b = as.numeric(exp(coef(fit_mm, model = "random"))) ), digits = 4 ) kbl10(head(ranef(fit_mm), 10)) ``` For random intercept + slope models: ```{r methods-coef-rs} kbl10( data.frame( parameter = names(coef(fit_mm_rs, model = "random")), estimate = as.numeric(coef(fit_mm_rs, model = "random")) ), digits = 4 ) kbl10(fit_mm_rs$random$D) ``` ### Variance-covariance, summary and likelihood criteria ```{r methods-summary} vc <- vcov(fit_mm) dim(vc) sm <- summary(fit_mm) kbl10(sm$coefficients) kbl10( data.frame( logLik = as.numeric(logLik(fit_mm)), AIC = AIC(fit_mm), BIC = BIC(fit_mm), nobs = nobs(fit_mm) ), digits = 4 ) ``` ### Fitted values, prediction and residuals ```{r methods-predict} kbl10( data.frame( mu_hat = head(fitted(fit_mm, type = "mu")), phi_hat = head(fitted(fit_mm, type = "phi")), pred_mu = head(predict(fit_mm, type = "response")), pred_eta = head(predict(fit_mm, type = "link")), pred_phi = head(predict(fit_mm, type = "precision")), pred_var = head(predict(fit_mm, type = "variance")) ), digits = 4 ) kbl10( data.frame( res_response = head(residuals(fit_mm, type = "response")), res_pearson = head(residuals(fit_mm, type = "pearson")) ), digits = 4 ) ``` ### Diagnostic plotting methods `plot.brsmm()` supports base and ggplot2 backends: ```{r methods-plot} plot(fit_mm, which = 1:4, type = "pearson") if (requireNamespace("ggplot2", quietly = TRUE)) { plot(fit_mm, which = 1:2, gg = TRUE) } ``` `autoplot.brsmm()` provides focused ggplot diagnostics: ```{r methods-autoplot} if (requireNamespace("ggplot2", quietly = TRUE)) { autoplot.brsmm(fit_mm, type = "calibration") autoplot.brsmm(fit_mm, type = "score_dist") autoplot.brsmm(fit_mm, type = "ranef_qq") autoplot.brsmm(fit_mm, type = "residuals_by_group") } ``` ### Prediction with `newdata` If `newdata` contains unseen groups, `predict.brsmm()` uses a random effect equal to zero for those levels. ```{r methods-newdata} nd <- sim$data[1:8, c("x1", "id")] kbl10( data.frame(pred_seen = as.numeric(predict(fit_mm, newdata = nd, type = "response"))), digits = 4 ) nd_unseen <- nd nd_unseen$id <- factor(rep("new_cluster", nrow(nd_unseen))) kbl10( data.frame(pred_unseen = as.numeric(predict(fit_mm, newdata = nd_unseen, type = "response"))), digits = 4 ) ``` The same logic applies to random intercept + slope models: ```{r methods-newdata-rs} kbl10( data.frame(pred_rs_seen = as.numeric(predict(fit_mm_rs, newdata = nd, type = "response"))), digits = 4 ) kbl10( data.frame(pred_rs_unseen = as.numeric(predict(fit_mm_rs, newdata = nd_unseen, type = "response"))), digits = 4 ) ``` ## Statistical tests and validation workflow ### Wald tests (from `summary`) `summary.brsmm()` reports Wald $z$-tests for each parameter: $$ z_k = \hat\theta_k / \mathrm{SE}(\hat\theta_k). $$ ```{r wald-tests} sm <- summary(fit_mm) kbl10(sm$coefficients) ``` ### Evolutionary scheme and Likelihood Ratio (LR) test selection A practical workflow of increasing complexity: 1. `brs()`: no random effect (ignores clustering); 2. `brsmm(..., random = ~ 1 | id)`: random intercept; 3. `brsmm(..., random = ~ 1 + x1 | id)`: random intercept + slope. In the first jump (`brs` to `brsmm` with intercept), the hypothesis $\sigma_b^2 = 0$ lies on the boundary of the parameter space. Thus, the classical asymptotic $\chi^2$ reference distribution should be interpreted with caution. In the second jump (intercept to intercept + slope), the Likelihood Ratio (LR) test with a $\chi^2$ distribution is commonly used as a practical diagnostic for goodness-of-fit gains. ```{r lr-evolution} # Base model without a random effect fit_brs <- brs( y ~ x1, data = sim$data, repar = 2 ) # Reuse the mixed models already fitted: # fit_mm : random = ~ 1 | id # fit_mm_rs : random = ~ 1 + x1 | id tab_lr <- anova(fit_brs, fit_mm, fit_mm_rs, test = "Chisq") kbl10( data.frame(model = rownames(tab_lr), tab_lr, row.names = NULL), digits = 4 ) ``` Operational decision rule (analytical): * If the second jump (RI to RI+RS) does not improve the fit (high p-value), prefer the random-intercept model for parsimony. * If there is a robust gain, adopt the RI+RS model and validate parameter stability (especially `sd_b` and the $D$ matrix) via sensitivity and residual diagnostics. ### Residual diagnostics (quick checks) ```{r quick-diagnostics} r <- residuals(fit_mm, type = "pearson") kbl10( data.frame( mean = mean(r), sd = stats::sd(r), q025 = as.numeric(stats::quantile(r, 0.025)), q975 = as.numeric(stats::quantile(r, 0.975)) ), digits = 4 ) ``` ## Parameter recovery experiment A single-fit recovery table can be produced directly from the previous fit: ```{r recovery-single} est <- c( beta0 = unname(coef(fit_mm, model = "mean")[1]), beta1 = unname(coef(fit_mm, model = "mean")[2]), sigma_b = unname(exp(coef(fit_mm, model = "random"))) ) true <- c( beta0 = sim$truth$beta[1], beta1 = sim$truth$beta[2], sigma_b = sim$truth$sigma_b ) recovery_table <- data.frame( parameter = names(true), true = as.numeric(true), estimate = as.numeric(est[names(true)]), bias = as.numeric(est[names(true)] - true) ) kbl10(recovery_table) ``` For a Monte Carlo recovery study, repeat simulation and fitting across replicates: ```{r recovery-montecarlo, eval = FALSE} mc_recovery <- function(R = 50L, seed = 7001L) { set.seed(seed) out <- vector("list", R) for (r in seq_len(R)) { sim_r <- sim_brsmm_data(seed = seed + r) fit_r <- brsmm( y ~ x1, random = ~ 1 | id, data = sim_r$data, repar = 2, int_method = "laplace", method = "BFGS", control = list(maxit = 1000) ) out[[r]] <- c( beta0 = unname(coef(fit_r, model = "mean")[1]), beta1 = unname(coef(fit_r, model = "mean")[2]), sigma_b = unname(exp(coef(fit_r, model = "random"))) ) } est <- do.call(rbind, out) truth <- c(beta0 = 0.20, beta1 = 0.65, sigma_b = 0.55) data.frame( parameter = colnames(est), truth = as.numeric(truth[colnames(est)]), mean_est = colMeans(est), bias = colMeans(est) - truth[colnames(est)], rmse = sqrt(colMeans((sweep(est, 2, truth[colnames(est)], "-"))^2)) ) } kbl10(mc_recovery(R = 50)) ``` ## How this maps to automated package tests The package test suite includes dedicated `brsmm` tests for: 1. fitting with Laplace integration; 2. one- and two-part formulas; 3. S3 methods (`coef`, `vcov`, `summary`, `predict`, `residuals`, `ranef`); 4. parameter recovery under known DGP settings. Run locally: ```{r run-tests, eval = FALSE} devtools::test(filter = "brsmm") ``` ## References Ferrari, S. L. P. and Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions. *Journal of Applied Statistics*, 31(7), 799-815. DOI: 10.1080/0266476042000214501. Validated online via: [https://doi.org/10.1080/0266476042000214501](https://doi.org/10.1080/0266476042000214501). Pinheiro, J. C. and Bates, D. M. (2000). *Mixed-Effects Models in S and S-PLUS*. Springer. DOI: 10.1007/b98882. Validated online via: [https://doi.org/10.1007/b98882](https://doi.org/10.1007/b98882). Rue, H., Martino, S., and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. *Journal of the Royal Statistical Society: Series B*, 71(2), 319-392. DOI: 10.1111/j.1467-9868.2008.00700.x. Validated online via: [https://doi.org/10.1111/j.1467-9868.2008.00700.x](https://doi.org/10.1111/j.1467-9868.2008.00700.x).