--- title: "Getting Started with hhdynamics" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with hhdynamics} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Overview **hhdynamics** fits a Bayesian household transmission model using MCMC to estimate: - The daily probability of infection from the community - The probability of person-to-person transmission within households - Effects of covariates on infectivity and susceptibility The model accounts for tertiary transmission (household contacts infecting other household contacts) and uses a serial interval distribution to link infection times. ## Data format Input data must be in long format (one row per individual) with these required columns: | Column | Description | |----------|-------------| | `hhID` | Household identifier | | `member` | Member index within household (0 = index case, 1, 2, ... = contacts) | | `size` | Number of individuals in the household | | `end` | End date of follow-up for that individual | | `inf` | Infection status (1 = infected, 0 = not infected). Index cases must be infected. | | `onset` | Onset time of symptoms (use -1 or NA for uninfected) | Additional columns can be included as covariates for infectivity or susceptibility. ```{r data} library(hhdynamics) data("inputdata") str(inputdata) head(inputdata) ``` ## Serial interval The model uses a discrete serial interval distribution (probability mass function of length 14, one value per day). By default, the bundled influenza serial interval from Tsang et al. (2014) is used: ```{r si} data("SI") print(SI) barplot(SI, names.arg = 1:14, xlab = "Days", ylab = "Probability", main = "Default serial interval distribution (influenza)") ``` You can provide your own SI vector, or let the model estimate it from data (see below). ## Fitting the model Use `household_dynamics()` to fit the model. Specify covariates using R formulas: ```{r fit, eval = FALSE} # With covariates (uses default influenza SI) fit <- household_dynamics(inputdata, inf_factor = ~sex, sus_factor = ~age, n_iteration = 50000, burnin = 10000, thinning = 10) # Without covariates fit <- household_dynamics(inputdata, n_iteration = 50000, burnin = 10000, thinning = 10) # With a custom serial interval my_SI <- c(0, 0.01, 0.05, 0.15, 0.25, 0.25, 0.15, 0.08, 0.04, 0.015, 0.005, 0, 0, 0) fit <- household_dynamics(inputdata, SI = my_SI, n_iteration = 50000, burnin = 10000, thinning = 10) ``` The function returns an `hhdynamics_fit` object containing the full MCMC output. ## Inspecting results ### Quick overview ```{r print, eval = FALSE} print(fit) # Household transmission model fit # Data: 386 households, 1533 individuals # MCMC: 4000 post-burnin samples (50000 iterations, burnin: 10000, thin: 10) # Runtime: 85 seconds # Parameters: community, household, sex1.0, age1.0, age2.0 # # Use summary() for estimates, coef() for posterior means. ``` ### Parameter estimates `summary()` returns a table with posterior means, 95% credible intervals, and exponentiated estimates for covariate effects: ```{r summary, eval = FALSE} summary(fit) # Variable Point estimate Lower bound Upper bound ... # Daily probability of infection from community 0.004 0.002 0.007 # Probability of person-to-person transmission in households 0.057 0.034 0.084 # sex1.0 -0.081 -0.733 0.488 # age1.0 -0.065 -0.537 0.412 # age2.0 -0.312 -0.831 0.170 ``` Community and household parameters are reported on the probability scale (via `1 - exp(-x)` transform). Covariate effects are on the log scale, with exponentiated values also shown. ### Posterior means ```{r coef, eval = FALSE} coef(fit) # re_sd community household size_param sex1.0 age1.0 age2.0 # 1.000000000 0.004239978 0.058555948 0.000000000 -0.080689680 -0.065001794 -0.312414284 ``` ## Accessing MCMC output The fit object stores all MCMC output for custom diagnostics and post-processing: ```{r access, eval = FALSE} # Posterior samples matrix (post-burnin, thinned) dim(fit$samples) # e.g. 4000 x 7 # Log-likelihood trace (full chain, for convergence checks) dim(fit$log_likelihood) # e.g. 50000 x 3 # Per-parameter acceptance rates fit$acceptance # Trace plot for community parameter plot(fit$samples[, "community"], type = "l", ylab = "Community rate", xlab = "Iteration") # Posterior density hist(fit$samples[, "household"], breaks = 50, probability = TRUE, main = "Household transmission probability (raw scale)", xlab = "Rate parameter") ``` ## Visualization hhdynamics provides several plotting functions for diagnostics and publication-ready figures. ### MCMC diagnostics ```{r plot_diag, eval = FALSE} # Trace plots and posterior densities for all parameters plot_diagnostics(fit) # Specific parameters only plot_diagnostics(fit, params = c("community", "household")) ``` ### Transmission probability over time ```{r plot_trans, eval = FALSE} # Daily probability of transmission with 95% CrI plot_transmission(fit) # Save to PDF pdf("transmission.pdf", width = 7, height = 5) plot_transmission(fit) dev.off() ``` ### Covariate effects (forest plot) ```{r plot_cov, eval = FALSE} # Forest plot of covariate effects (relative risks) plot_covariates(fit) # With custom labels for variable headers and factor levels plot_covariates(fit, file = "covariates.pdf", labels = list( sex = list(name = "Sex", levels = c("Male", "Female")), age = list(name = "Age Group", levels = c("0-5", "6-17", "18+")))) ``` The `file` parameter auto-sizes the PDF dimensions based on the number of covariates. ### Secondary attack rates ```{r plot_sar, eval = FALSE} # Overall attack rate plot_attack_rate(fit) # Stratified by a single variable plot_attack_rate(fit, by = ~age) # Combine overall + multiple strata in one figure plot_attack_rate(fit, by = list(~sex, ~age), include_overall = TRUE, labels = list( sex = list(name = "Sex", levels = c("Male", "Female")), age = list(name = "Age Group", levels = c("0-5", "6-17", "18+")))) ``` ### Household timeline ```{r plot_hh, eval = FALSE} # Visualize a single household: index (triangle), infected contacts # (filled circles), uninfected contacts (open circles) plot_household(fit, hh_id = 1) ``` ## Summary tables hhdynamics includes table functions that return data frames suitable for reporting. ### Parameter estimates ```{r tab_param, eval = FALSE} # Posterior mean, median, 95% CrI, and acceptance rate table_parameters(fit) # Include effective sample size table_parameters(fit, show_ess = TRUE) ``` ### Covariate effects ```{r tab_cov, eval = FALSE} # Relative risks with 95% CrIs, grouped by infectivity/susceptibility table_covariates(fit) ``` ### Secondary attack rates ```{r tab_sar, eval = FALSE} # Overall SAR with Wilson CI table_attack_rates(fit) # Stratified by covariate table_attack_rates(fit, by = ~age) ``` ## Input validation `household_dynamics()` validates inputs before running the MCMC, catching common errors early: ```{r validation, eval = FALSE} # Missing column household_dynamics(inputdata[, -1]) # Error: 'input' is missing required columns: hhID # Wrong formula variable household_dynamics(inputdata, inf_factor = ~nonexistent) # Error: Variables in inf_factor not found in input: nonexistent # Old string syntax (no longer supported) household_dynamics(inputdata, inf_factor = "~sex") # Error: 'inf_factor' must be a formula (e.g. ~sex) or NULL. ``` ## Missing data handling ### Missing onset times Infected household contacts with missing (`NA`) onset times are automatically imputed during MCMC. The sampler proposes onset times uniformly within the follow-up window and accepts/rejects via the full likelihood. Index case onsets must be non-missing. ```{r missing_onset, eval = FALSE} # Some infected contacts have unknown symptom onset inputdata_missing_onset <- inputdata infected_contacts <- which(inputdata$member > 0 & inputdata$inf == 1) inputdata_missing_onset$onset[infected_contacts[1:5]] <- NA fit_onset <- household_dynamics(inputdata_missing_onset, ~sex, ~age, n_iteration = 50000, burnin = 10000, thinning = 10) # Note: 5 of 92 infected contact(s) (5.4%) have missing onset times. These will be imputed during MCMC. ``` ### Missing covariates `household_dynamics()` also imputes missing factor covariates via Bayesian data augmentation. Simply pass your data with `NA` values in factor columns: ```{r missing, eval = FALSE} # Introduce some missing values inputdata_missing <- inputdata inputdata_missing$sex[c(5, 12, 30)] <- NA inputdata_missing$age[c(8, 20)] <- NA # Fit as usual — missing values are imputed automatically fit_missing <- household_dynamics(inputdata_missing, ~sex, ~age, n_iteration = 50000, burnin = 10000, thinning = 10) # Note: Covariate 'sex' has 3 missing value(s) (0.2%). These will be imputed during MCMC. # Note: Covariate 'age' has 2 missing value(s) (0.1%). These will be imputed during MCMC. summary(fit_missing) ``` **Restrictions:** - Only factor covariates can have missing values. Continuous covariates with `NA` will produce an error. - Interaction terms (`~sex*age`) are not supported when covariates have missing data. - The same variable cannot appear in both `inf_factor` and `sus_factor` if it has missing values. ## Estimating the serial interval If the serial interval is unknown or you want to estimate it jointly with the transmission parameters, set `estimate_SI = TRUE`. The model parameterizes the SI as a Weibull distribution and estimates the shape and scale parameters via MCMC: ```{r estimate_si, eval = FALSE} fit_si <- household_dynamics(inputdata, inf_factor = ~sex, sus_factor = ~age, n_iteration = 50000, burnin = 10000, thinning = 10, estimate_SI = TRUE) summary(fit_si) # Output now includes si_shape and si_scale parameters # Reconstruct the estimated SI distribution est_shape <- mean(fit_si$samples[, "si_shape"]) est_scale <- mean(fit_si$samples[, "si_scale"]) est_SI <- pweibull(2:15, est_shape, est_scale) - pweibull(1:14, est_shape, est_scale) barplot(est_SI, names.arg = 1:14, xlab = "Days", ylab = "Probability", main = "Estimated serial interval") ``` The Weibull priors are: shape ~ Uniform(0.1, 10), scale ~ Uniform(0.1, 20). When `estimate_SI = TRUE`, the `SI` argument is not used. ## Simulation `simulate_data()` generates synthetic datasets for validation or power analysis: ```{r simulate, eval = FALSE} # Use fitted parameter values para <- coef(fit) # Simulate 10 replicates of the original data structure sim <- simulate_data(inputdata, rep_num = 10, inf_factor = ~sex, sus_factor = ~age, para = para, with_rm = 0) ```