--- title: "The Algebra of MLEs" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{The Algebra of MLEs} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) oldopts <- options(digits = 4) ``` ## Introduction The maximum likelihood estimator (MLE) is a technology. Under regularity conditions, any MLE is asymptotically normal: $\hat\theta \sim N(\theta, I(\theta)^{-1})$, where $I(\theta)$ is the Fisher information matrix. This fact is remarkable because it holds regardless of the underlying model -- exponential, Weibull, normal, or anything else. The MLE reduces the problem of inference to a mean vector and a covariance matrix. `algebraic.mle` is the algebra that follows from this fact. It does not find MLEs -- it takes them as input and provides operations for composition, transformation, and inference. Given two independent experiments, `joint()` composes their MLEs into a single joint estimator with block-diagonal covariance. Given three labs estimating the same quantity, `combine()` produces the optimal inverse-variance weighted estimate. Given a transformation $g(\theta)$, `rmap()` propagates uncertainty through the delta method. And `as_dist()` bridges to the distribution algebra in `algebraic.dist`, where normal distributions can be further composed. This vignette walks through each operation with concrete examples, culminating in a full pipeline from independent experiments to system reliability inference. ```{r setup} library(algebraic.mle) ``` ## Creating MLEs The package provides three constructors for wrapping estimation results into the MLE algebra. Each produces an object that supports the full interface: `params()`, `vcov()`, `confint()`, `se()`, `AIC()`, and the algebraic operations. ### Direct construction with `mle()` When you know the point estimate and its variance-covariance matrix (e.g., from analytical results or published tables), construct an MLE directly: ```{r mle-direct} # Exponential rate from a published study: lambda_hat = 0.5, se = 0.07, n = 200 fit_pub <- mle( theta.hat = c(lambda = 0.5), sigma = matrix(0.07^2), nobs = 200L ) params(fit_pub) se(fit_pub) ``` ### From numerical optimization with `mle_numerical()` The most common path: maximize a log-likelihood with `optim()`, then wrap the result. The Hessian at the optimum gives the variance-covariance matrix. ```{r mle-numerical} set.seed(42) x <- rexp(80, rate = 2) loglik <- function(rate) { if (rate <= 0) return(-Inf) sum(dexp(x, rate = rate, log = TRUE)) } result <- optim( par = c(lambda = 1), fn = loglik, method = "Brent", lower = 0.01, upper = 20, hessian = TRUE, control = list(fnscale = -1) ) fit_num <- mle_numerical(result, options = list(nobs = length(x))) params(fit_num) se(fit_num) ``` ### From bootstrap with `mle_boot()` When the sample is small or regularity conditions are questionable, bootstrap the MLE and wrap the `boot` object: ```{r mle-boot} set.seed(42) y <- rexp(30, rate = 2) boot_result <- boot::boot( data = y, statistic = function(d, i) c(lambda = 1 / mean(d[i])), R = 999 ) fit_boot <- mle_boot(boot_result) params(fit_boot) se(fit_boot) confint(fit_boot, type = "perc") ``` ## Composing Independent MLEs When independent experiments measure different parameters of a system, `joint()` composes their MLEs into a single joint estimator. The result has a block-diagonal variance-covariance matrix because the experiments are independent -- there is no cross-covariance between their parameter estimates. **Motivating example**: A series system has two components. Lab A tests component 1 and estimates its failure rate $\lambda_1$. Lab B tests component 2 and estimates $\lambda_2$. Neither lab knows about the other component, but we need the joint parameter vector $(\lambda_1, \lambda_2)$ for system-level inference. ```{r joint-example} # Lab A: component 1 failure rate fit_A <- mle( theta.hat = c(lambda1 = 0.02), sigma = matrix(0.001^2), nobs = 150L ) # Lab B: component 2 failure rate fit_B <- mle( theta.hat = c(lambda2 = 0.05), sigma = matrix(0.003^2), nobs = 80L ) # Joint MLE: block-diagonal covariance fit_joint <- joint(fit_A, fit_B) params(fit_joint) vcov(fit_joint) ``` The off-diagonal entries are zero -- the hallmark of independence. The joint MLE supports the full interface: ```{r joint-methods} confint(fit_joint) se(fit_joint) ``` Use `marginal()` to recover individual component estimates from the joint: ```{r joint-marginal} # Recover component 2 parameters fit_B_recovered <- marginal(fit_joint, 2) params(fit_B_recovered) se(fit_B_recovered) ``` `joint()` extends to any number of independent MLEs: ```{r joint-three} fit_C <- mle( theta.hat = c(lambda3 = 0.01), sigma = matrix(0.0005^2), nobs = 200L ) fit_system <- joint(fit_A, fit_B, fit_C) params(fit_system) ``` ## Combining Repeated Estimates When multiple independent experiments estimate the **same** parameter, `combine()` produces the optimal estimator via inverse-variance weighting. The combined estimate weights more precise estimates more heavily, and the combined variance is always less than any individual variance. **Motivating example**: Three labs independently estimate the failure rate $\lambda$ of the same component type. ```{r combine-example} lab1 <- mle(theta.hat = c(lambda = 0.050), sigma = matrix(0.004^2), nobs = 50L) lab2 <- mle(theta.hat = c(lambda = 0.048), sigma = matrix(0.002^2), nobs = 200L) lab3 <- mle(theta.hat = c(lambda = 0.053), sigma = matrix(0.003^2), nobs = 100L) combined <- combine(lab1, lab2, lab3) params(combined) se(combined) ``` The combined standard error is smaller than any individual standard error: ```{r combine-precision} cat("Lab SEs: ", se(lab1), se(lab2), se(lab3), "\n") cat("Combined SE: ", se(combined), "\n") ``` The combined estimate is pulled toward the more precise estimates. Lab 2 has the smallest variance, so it receives the most weight. When all estimates have equal precision, `combine()` reduces to the simple average: ```{r combine-equal} eq1 <- mle(theta.hat = c(mu = 10.0), sigma = matrix(1)) eq2 <- mle(theta.hat = c(mu = 12.0), sigma = matrix(1)) eq3 <- mle(theta.hat = c(mu = 11.0), sigma = matrix(1)) eq_combined <- combine(eq1, eq2, eq3) params(eq_combined) # (10 + 12 + 11) / 3 = 11 ``` ## Transformations via Invariance By the invariance property of the MLE, if $\hat\theta$ is the MLE of $\theta$, then $g(\hat\theta)$ is the MLE of $g(\theta)$ for any function $g$. The `rmap()` function computes this transformation and propagates uncertainty using the delta method: $$ \text{Var}(g(\hat\theta)) \approx J \, \text{Var}(\hat\theta) \, J^\top, $$ where $J$ is the Jacobian of $g$ evaluated at $\hat\theta$. ### Univariate example: rate to mean lifetime An exponential component has rate $\lambda$. The mean lifetime is $\mu = 1/\lambda$. Transform the MLE: ```{r rmap-univariate} fit_rate <- mle( theta.hat = c(lambda = 0.02), sigma = matrix(0.001^2), nobs = 150L ) # g: rate -> mean lifetime fit_lifetime <- rmap(fit_rate, function(theta) c(mean_life = 1 / theta[1]), method = "delta") params(fit_lifetime) se(fit_lifetime) confint(fit_lifetime) ``` ### Multivariate example: component rates to system reliability For a series system with independent exponential components, the system reliability at time $t$ is $$ R(t) = \exp(-\lambda_1 t) \cdot \exp(-\lambda_2 t) = \exp(-(\lambda_1 + \lambda_2) t). $$ Starting from the joint MLE of $(\lambda_1, \lambda_2)$: ```{r rmap-system} fit_rates <- joint( mle(theta.hat = c(lambda1 = 0.02), sigma = matrix(0.001^2), nobs = 150L), mle(theta.hat = c(lambda2 = 0.05), sigma = matrix(0.003^2), nobs = 80L) ) # System reliability at t = 10 hours t0 <- 10 R_mle <- rmap(fit_rates, function(theta) c(R_t = exp(-(theta[1] + theta[2]) * t0)), method = "delta" ) params(R_mle) se(R_mle) confint(R_mle) ``` The delta method automatically computes the Jacobian via numerical differentiation (using `numDeriv::jacobian`) and propagates the covariance. ## Bridging to Distribution Algebra `as_dist()` converts an MLE to its asymptotic normal (or multivariate normal) distribution, bridging into the `algebraic.dist` package where distributions can be further composed. ```{r as-dist-basic} fit1 <- mle(theta.hat = c(lambda = 0.02), sigma = matrix(0.001^2)) d1 <- as_dist(fit1) d1 ``` This is useful when you want to reason about the sampling distribution of the MLE as a distribution object. For instance, you can compute the probability that the true rate exceeds a regulatory threshold: ```{r as-dist-prob} # P(lambda > 0.021) under the asymptotic distribution 1 - cdf(d1)(0.021) ``` For bootstrap MLEs, `as_dist()` returns an empirical distribution built from the bootstrap replicates: ```{r as-dist-boot} d_boot <- as_dist(fit_boot) d_boot ``` ## Full Pipeline Here is an end-to-end example tying together all the algebraic operations. **Scenario**: A series system has two independent components with exponential lifetimes. Separate accelerated life tests estimate each component's failure rate. We want to infer the system reliability at a mission time of $t = 50$ hours, including a confidence interval. ```{r pipeline-setup} set.seed(123) # --- Step 1: Independent MLEs from separate experiments --- # Component 1: 200 observed lifetimes x1 <- rexp(200, rate = 0.003) loglik1 <- function(rate) { if (rate <= 0) return(-Inf) sum(dexp(x1, rate = rate, log = TRUE)) } fit1 <- mle_numerical( optim(par = c(lambda1 = 0.001), fn = loglik1, method = "Brent", lower = 1e-6, upper = 1, hessian = TRUE, control = list(fnscale = -1)), options = list(nobs = length(x1)) ) # Component 2: 120 observed lifetimes x2 <- rexp(120, rate = 0.008) loglik2 <- function(rate) { if (rate <= 0) return(-Inf) sum(dexp(x2, rate = rate, log = TRUE)) } fit2 <- mle_numerical( optim(par = c(lambda2 = 0.001), fn = loglik2, method = "Brent", lower = 1e-6, upper = 1, hessian = TRUE, control = list(fnscale = -1)), options = list(nobs = length(x2)) ) cat("Component 1 rate:", params(fit1), " SE:", se(fit1), "\n") cat("Component 2 rate:", params(fit2), " SE:", se(fit2), "\n") ``` ```{r pipeline-compose} # --- Step 2: Compose into joint parameter space --- fit_system <- joint(fit1, fit2) cat("Joint parameters:", params(fit_system), "\n") cat("Joint vcov:\n") vcov(fit_system) ``` ```{r pipeline-transform} # --- Step 3: Transform to system reliability at t = 50 --- mission_time <- 50 R_system <- rmap(fit_system, function(theta) c(R_sys = exp(-(theta[1] + theta[2]) * mission_time)), method = "delta" ) cat("System reliability R(50):", params(R_system), "\n") cat("SE of R(50): ", se(R_system), "\n") ``` ```{r pipeline-inference} # --- Step 4: Inference --- cat("95% CI for R(50):\n") confint(R_system) ``` ```{r pipeline-dist} # --- Step 5: Bridge to distribution algebra --- R_dist <- as_dist(R_system) cat("Asymptotic distribution of R(50):", "\n") R_dist # Probability that system reliability exceeds 55% cat("P(R(50) > 0.55):", 1 - cdf(R_dist)(0.55), "\n") ``` The pipeline reads as a chain of algebraic operations: two independent MLEs are composed via `joint()`, transformed to the quantity of interest via `rmap()`, and then reasoned about as a distribution via `as_dist()`. Each step preserves the uncertainty structure inherited from the original experiments. ```{r, include = FALSE} options(oldopts) ```