--- title: "Introduction to rbreak" author: "Cheolju Kim, Zhongjun Qu" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to rbreak} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5 ) ``` ## Introduction Structural change models are widely used to detect and estimate break points in time series relationships. In many applications, however, researchers also want to impose structure across regimes rather than allow every coefficient to vary freely. The `rbreak` package implements the methods of Perron and Qu (2006) for estimating and testing **restricted structural change models**. It supports **linear restrictions on coefficients within and across regimes**, which makes it possible to estimate and test break models under economically meaningful constraints. ### Model Setup Consider a linear regression model with $m$ structural breaks: $$ y_t = z_t' \delta_j + u_t, \quad t = T_{j-1}+1, \ldots, T_j $$ for $j = 1, \ldots, m+1$, where $T_0 = 0$ and $T_{m+1} = T$. Typical hypotheses include: - Some coefficients are equal across certain regimes - Certain linear combinations of coefficients are zero - Coefficients follow a specific pattern across regimes Standard structural break methods do not handle these restrictions directly. ### Restriction Forms `rbreak` supports two equivalent ways to write restrictions: **Form A**: $\delta = S\theta + s$\ Parameters are expressed as linear combinations of a smaller set of free parameters $\theta$. **Form B**: $R\delta = r$\ Direct linear restrictions on the parameter vector $\delta$. Main features include: - Computationally efficient estimation of break dates and coefficients under restrictions - Sup-F tests for structural change with restrictions - Confidence intervals for break dates - Bootstrap methods to avoid local minima ## Installation ```{r eval=FALSE} install.packages("rbreak") ``` Load the package: ```{r} library(rbreak) ``` ## The Example Dataset The package includes the example dataset used in Perron and Qu (2004): ```{r} data(example) y <- example$y bigt <- length(y) cat("Sample size:", bigt, "\n") ``` A quick plot: ```{r, fig.width=7, fig.height=4} plot(y, type = "l", main = "Example Time Series Data", xlab = "Time", ylab = "Value", col = "steelblue", lwd = 0.8) grid() ``` ## A Simple Example Suppose we want to test for 3 structural breaks, with coefficients in regimes 1 and 3 constrained to be equal and coefficients in regimes 2 and 4 constrained to be equal. ### Data Setup ```{r} # Load the example data data(example) y <- example$y bigt <- length(y) # Regressors (intercept only in this example) z <- matrix(1, nrow = bigt, ncol = 1) # Number of regressors q <- 1 # Number of breaks under the alternative m <- 3 ``` ## Using Form A Restrictions Form A writes the restrictions as $\delta = S\theta + s$, which is useful when the coefficients can be expressed in terms of a smaller set of free parameters. ### Specifying Restrictions (Form A) For the restriction above, Form A can be written as: ```{r eval=FALSE} # Form A: delta = S*theta + ss # We have 4 regimes with 1 regressor each, so delta is 4x1 # We want: delta_1 = theta_1, delta_2 = theta_2, delta_3 = theta_1, delta_4 = theta_2 # This means we only have 2 free parameters (theta_1, theta_2) S <- matrix(c(1, 0, # delta_1 = theta_1 0, 1, # delta_2 = theta_2 1, 0, # delta_3 = theta_1 0, 1), # delta_4 = theta_2 nrow = 4, ncol = 2, byrow = TRUE) ss <- matrix(0, nrow = 4, ncol = 1) ``` ### Conducting the Analysis (Form A) ```{r eval=FALSE} result_formA <- mainp( m = m, q = q, z = z, y = y, trm = 0.10, # 10% trimming robust = 1, # Use HAC standard errors prewhit = 0, # No prewhitening hetvar = 1, # Allow heteroskedastic variance S = S, ss = ss, # Form A restriction R = NULL, rr = NULL, # Not used when forma=1 doestim = 1, # Perform estimation dotest = 1, # Perform sup-F test docv = 0, # Critical value simulation bigt = bigt, forma = 1 # Use Form A restrictions ) ``` Form A and Form B represent the same restriction in different parametrizations, so they should give the same result. ### Specifying Restrictions (Form B) For Form B, specify the restriction matrix $R$ and vector $r$ such that $R\delta = r$: ```{r eval=FALSE} # Restriction matrix: R*delta = rr # delta = (delta_1, delta_2, delta_3, delta_4) # Restrictions: delta_1 = delta_3, delta_2 = delta_4 R <- matrix(c(1, 0, -1, 0, # delta_1 - delta_3 = 0 0, 1, 0, -1), # delta_2 - delta_4 = 0 nrow = 2, byrow = TRUE) rr <- c(0, 0) ``` ### Conducting the Analysis (Form B) Run the full analysis with `mainp()`: ```{r eval=FALSE} result <- mainp( m = m, q = q, z = z, y = y, trm = 0.10, # 10% trimming robust = 1, # Use HAC standard errors prewhit = 0, # No prewhitening hetvar = 1, # Allow heteroskedastic variance R = R, rr = rr, # Form B restriction doestim = 1, # Perform estimation dotest = 1, # Perform sup-F test docv = 0, # Critical value simulation bigt = bigt, formb = 1 # Use Form B restrictions ) ``` ## Bootstrap Methods for Avoiding Local Minima Estimation under restrictions may converge to local minima. The package provides two bootstrap-based refinements to improve the solution. ### The Problem of Local Minima When estimating break dates under restrictions, the restricted sum of squared residuals may have multiple local minima. Standard iterative algorithms can therefore get trapped in suboptimal solutions. ### Bootstrap Restarting Break-Point (BRBP) Algorithm The `brbp()` function implements the Bootstrap Restarting Break-Point algorithm using **resampling bootstrap**. Following Wood (2001), it repeatedly perturbs the problem and restarts the estimation to reduce the chance of settling at a spurious local minimum. **Algorithm 1: Bootstrap Restarting Break-Point (BRBP) - Resampling Bootstrap** **Step 1.** Assign an initial set of break dates $T^{(0)} = \{T^{(0)}_1,\dots,T^{(0)}_m\}$. This is typically the output from the `mainp()` function. **Step 2.** Using $T^{(0)}$ as initial values, obtain the restricted parameter estimators $\hat{\theta}_0 = (\widehat T_{(0)}, \widehat\delta_{(0)})$. **Step 3.** For $b = 1,\dots,B$ do: - **(3.1)** Generate a bootstrap sample $\{(y_t^{*}, Z_t^{*})\}_{t=1}^T$ and reorder it in chronological order. - **(3.2)** Using $\widehat T_{(b-1)}$ as initial break dates, estimate restricted structural breaks model to the bootstrap sample and obtain $(\widetilde T^{*}_{(b)}, \widetilde\delta^{*}_{(b)})$. - **(3.3)** Using $\widetilde T^{*}_{(b)}$ as initial break dates, re-estimate the model on the original sample and obtain $\tilde{\theta}_{(b)} = (\widetilde T_{(b)}, \widetilde\delta_{(b)})$. - **(3.4)** Compute $RSSR(\tilde{\theta}_{(b)})$ and compare it with $RSSR(\hat{\theta}_{(b-1)})$. - **if** $RSSR(\tilde{\theta}_{(b)}) < RSSR(\hat{\theta}_{(b-1)})$ **then** set $\hat{\theta}_{(b)} = \tilde{\theta}_{(b)}$; - **else** set $\hat{\theta}_{(b)} = \hat{\theta}_{(b-1)}$. **Step 4.** Set the final estimator $(\widehat T, \widehat\delta) = (\widehat T_{(B)}, \widehat\delta_{(B)})$. ```{r eval=FALSE} # Initial break dates (can be from unrestricted estimation or user-specified) # For the example data, we use manually specified initial dates T_init <- c(47, 64, 87) # Initial break dates # Run BRBP algorithm (Form B) result_brbp <- brbp( y = y, z = z, q = q, m = m, bigt = bigt, trm = 0.10, T_init = T_init, B = 20, # Number of bootstrap replications R = R, rr = rr, # Form B restrictions verbose = FALSE # Set to TRUE to see progress ) # Results breaks_brbp <- result_brbp$dx coefficients_brbp <- result_brbp$delta rssr_brbp <- result_brbp$rssr # Track convergence all_breaks <- result_brbp$all_break_dates # All break dates across iterations all_rssr <- result_brbp$all_rssr # All RSSR values across iterations # Plot convergence (if desired) plot(all_rssr, type = "l", xlab = "Iteration", ylab = "RSSR", main = "BRBP Convergence (Form B)") ``` #### Using Form A with BRBP The same bootstrap algorithm can be used with Form A restrictions: ```{r eval=FALSE} # Form A restrictions (same as defined earlier) S <- matrix(c(1, 0, 1, 0, 0, 1, 0, 1), nrow = 4, ncol = 2, byrow = TRUE) ss <- matrix(0, nrow = 4, ncol = 1) # Run BRBP algorithm (Form A) result_brbp_formA <- brbp( y = y, z = z, q = q, m = m, bigt = bigt, trm = 0.10, T_init = T_init, B = 20, # Number of bootstrap replications S = S, ss = as.vector(ss), # Form A restrictions R = NULL, rr = NULL, # Not used for Form A verbose = FALSE ) # Results breaks_brbp_formA <- result_brbp_formA$dx coefficients_brbp_formA <- result_brbp_formA$delta rssr_brbp_formA <- result_brbp_formA$rssr # Plot convergence plot(result_brbp_formA$all_rssr, type = "l", xlab = "Iteration", ylab = "RSSR", main = "BRBP Convergence (Form A)") ``` ### Residual Bootstrap Restarting Break-Point (RBRBP) Algorithm The `brbp_residual()` function implements the Residual Bootstrap Restarting Break-Point algorithm using **residual bootstrap**. This is an alternative method that preserves time ordering, also based on the bootstrap restarting framework of Wood (2001). The algorithm proceeds as follows: **Algorithm 2: Residual Bootstrap Restarting Break-Point (RBRBP) - Residual Bootstrap** **Step 1.** Assign an initial set of break dates $T^{(0)} = \{T^{(0)}_1,\dots,T^{(0)}_m\}$. **Step 2.** Using $T^{(0)}$ as initial values, estimate restricted structural breaks model to obtain the estimators $\hat{\theta}_0 = (\widehat T_{(0)}, \widehat\delta_{(0)})$. **Step 3.** For $b = 1,\dots,B$ do: - **(3.1)** Using $\hat{\theta}_{(b-1)} = (\widehat T_{(b-1)}, \widehat\delta_{(b-1)})$, compute residuals $\{e_t\}_{t=1}^T = \{y_t - Z_t(\widehat T_{(b-1)}) \widehat\delta_{(b-1)}\}_{t=1}^T$. - Resample residuals with replacement to obtain $\{e_t^{*}\}_{t=1}^T$. - Generate a bootstrap sample $\{y_t^{*}\}_{t=1}^T = \{Z_t(\widehat T_{(b-1)}) \widehat\delta_{(b-1)} + e_t^{*}\}_{t=1}^T$. - Note: $Z_t$ remains unchanged (time ordering preserved). - **(3.2)** Using $\widehat T_{(b-1)}$ as initial break dates, estimate restricted structural breaks model to the bootstrap sample and obtain $(\widetilde T^{*}_{(b)}, \widetilde\delta^{*}_{(b)})$. - **(3.3)** Using $\widetilde T^{*}_{(b)}$ as initial break dates, re-estimate the model on the original sample and obtain $\tilde{\theta}_{(b)} = (\widetilde T_{(b)}, \widetilde\delta_{(b)})$. - **(3.4)** Compute $RSSR(\tilde{\theta}_{(b)})$ and compare it with $RSSR(\hat{\theta}_{(b-1)})$. - **if** $RSSR(\tilde{\theta}_{(b)}) < RSSR(\hat{\theta}_{(b-1)})$ **then** set $\hat{\theta}_{(b)} = \tilde{\theta}_{(b)}$; - **else** set $\hat{\theta}_{(b)} = \hat{\theta}_{(b-1)}$. **Step 4.** Set the final estimator $(\widehat T, \widehat\delta) = (\widehat T_{(B)}, \widehat\delta_{(B)})$. ```{r eval=FALSE} # Run RBRBP algorithm (Form B) result_rbrbp <- brbp_residual( y = y, z = z, q = q, m = m, bigt = bigt, trm = 0.10, T_init = T_init, B = 20, # Number of bootstrap replications R = R, rr = rr, # Form B restrictions verbose = FALSE # Set to TRUE to see progress ) # Results breaks_rbrbp <- result_rbrbp$dx coefficients_rbrbp <- result_rbrbp$delta rssr_rbrbp <- result_rbrbp$rssr # Track convergence rall_breaks <- result_rbrbp$all_break_dates # All break dates across iterations rall_rssr <- result_rbrbp$all_rssr # All RSSR values across iterations # Plot convergence (if desired) plot(rall_rssr, type = "l", xlab = "Iteration", ylab = "RSSR", main = "RBRBP Convergence (Form B)") ``` #### Using Form A with RBRBP ```{r eval=FALSE} # Run RBRBP algorithm (Form A) result_rbrbp_formA <- brbp_residual( y = y, z = z, q = q, m = m, bigt = bigt, trm = 0.10, T_init = T_init, B = 20, # Number of bootstrap replications S = S, ss = as.vector(ss), # Form A restrictions R = NULL, rr = NULL, # Not used for Form A verbose = FALSE ) # Results breaks_rbrbp_formA <- result_rbrbp_formA$dx coefficients_rbrbp_formA <- result_rbrbp_formA$delta rssr_rbrbp_formA <- result_rbrbp_formA$rssr # Plot convergence plot(result_rbrbp_formA$all_rssr, type = "l", xlab = "Iteration", ylab = "RSSR", main = "RBRBP Convergence (Form A)") ``` ## Complete Workflow Example We demonstrate a complete workflow using the example dataset, including bootstrap refinement. ### Step 1: Initial Estimation ```{r eval=FALSE} # Load data data(example) y <- example$y bigt <- length(y) z <- matrix(1, nrow = bigt, ncol = 1) q <- 1 m <- 3 # Form B Restrictions: delta_1 = delta_3, delta_2 = delta_4 R <- matrix(c(1, 0, -1, 0, 0, 1, 0, -1), nrow = 2, ncol = 4, byrow = TRUE) rr <- c(0, 0) # Initial estimation (Form B) result_initial <- est2(y, z, q = q, m = m, bigt = bigt, trm = 0.10, R = R, rr = rr) T_init <- result_initial$dx cat("Initial break dates:", T_init, "\n") ``` Or using Form A: ```{r eval=FALSE} # Form A Restrictions: same as Form B S <- matrix(c(1, 0, 0, 1, 1, 0, 0, 1), nrow = 4, ncol = 2, byrow = TRUE) ss <- matrix(0, nrow = 4, ncol = 1) # Initial estimation (Form A) result_initial_formA <- est(y, z, q = q, m = m, bigt = bigt, trm = 0.10, S = S, ss = ss) T_init <- result_initial_formA$dx cat("Initial break dates:", T_init, "\n") ``` ### Step 2: Bootstrap Refinement When initial estimates may be suboptimal, bootstrap methods can help: #### Using Form B: ```{r eval=FALSE} # Run BRBP algorithm (Form B) result_brbp <- brbp( y = y, z = z, q = q, m = m, bigt = bigt, trm = 0.10, T_init = T_init, B = 20, # Number of bootstrap replications R = R, rr = rr, verbose = FALSE ) # Run RBRBP algorithm (Form B) result_rbrbp <- brbp_residual( y = y, z = z, q = q, m = m, bigt = bigt, trm = 0.10, T_init = T_init, B = 20, R = R, rr = rr, verbose = FALSE ) # Compare results cat("Initial break dates:", T_init, "\n") cat("BRBP break dates:", result_brbp$dx, "\n") cat("RBRBP break dates:", result_rbrbp$dx, "\n") cat("BRBP RSSR:", result_brbp$rssr, "\n") cat("RBRBP RSSR:", result_rbrbp$rssr, "\n") ``` #### Using Form A: ```{r eval=FALSE} # Run BRBP algorithm (Form A) result_brbp_formA <- brbp( y = y, z = z, q = q, m = m, bigt = bigt, trm = 0.10, T_init = T_init, B = 20, S = S, ss = as.vector(ss), R = NULL, rr = NULL, verbose = FALSE ) # Run RBRBP algorithm (Form A) result_rbrbp_formA <- brbp_residual( y = y, z = z, q = q, m = m, bigt = bigt, trm = 0.10, T_init = T_init, B = 20, S = S, ss = as.vector(ss), R = NULL, rr = NULL, verbose = FALSE ) # Compare results cat("Initial break dates:", T_init, "\n") cat("BRBP (Form A) break dates:", result_brbp_formA$dx, "\n") cat("RBRBP (Form A) break dates:", result_rbrbp_formA$dx, "\n") cat("BRBP (Form A) RSSR:", result_brbp_formA$rssr, "\n") cat("RBRBP (Form A) RSSR:", result_rbrbp_formA$rssr, "\n") ``` ## References **Primary Reference:** Perron, P., and Qu, Z. (2006). Estimating restricted structural change models. *Journal of Econometrics*, 134(2), 373-399. **Related References:** Bai, J., and Perron, P. (1998). Estimating and testing linear models with multiple structural changes. *Econometrica*, 66(1), 47-78. Bai, J., and Perron, P. (2003). Computation and analysis of multiple structural change models. *Journal of Applied Econometrics*, 18(1), 1-22. Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. *Econometrica*, 59(3), 817-858. Wood, A. T. A. (2001). Minimizing Model Fitting Objectives That Contain Spurious Local Minima by Bootstrap Restarting. *Biometrics*, 57(1), 240-244. ## Session Info ```{r} sessionInfo() ```