---
title: "Introduction to rbreak"
author: "Cheolju Kim, Zhongjun Qu"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to rbreak}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5
)
```

## Introduction

Structural change models are widely used to detect and estimate break points in time series relationships. In many applications, however, researchers also want to impose structure across regimes rather than allow every coefficient to vary freely.

The `rbreak` package implements the methods of Perron and Qu (2006) for estimating and testing **restricted structural change models**. It supports **linear restrictions on coefficients within and across regimes**, which makes it possible to estimate and test break models under economically meaningful constraints.

### Model Setup

Consider a linear regression model with $m$ structural breaks:

$$
y_t = z_t' \delta_j + u_t, \quad t = T_{j-1}+1, \ldots, T_j
$$

for $j = 1, \ldots, m+1$, where $T_0 = 0$ and $T_{m+1} = T$.

Typical hypotheses include:

-   Some coefficients are equal across certain regimes
-   Certain linear combinations of coefficients are zero
-   Coefficients follow a specific pattern across regimes

Standard structural break methods do not handle these restrictions directly.

### Restriction Forms

`rbreak` supports two equivalent ways to write restrictions:

**Form A**: $\delta = S\theta + s$\
Parameters are expressed as linear combinations of a smaller set of free parameters $\theta$.

**Form B**: $R\delta = r$\
Direct linear restrictions on the parameter vector $\delta$.

Main features include:

-   Computationally efficient estimation of break dates and coefficients under restrictions
-   Sup-F tests for structural change with restrictions
-   Confidence intervals for break dates
-   Bootstrap methods to avoid local minima

## Installation

```{r eval=FALSE}
install.packages("rbreak")
```

Load the package:

```{r}
library(rbreak)
```

## The Example Dataset

The package includes the example dataset used in Perron and Qu (2004):

```{r}
data(example)
y <- example$y
bigt <- length(y)
cat("Sample size:", bigt, "\n")
```

A quick plot:

```{r, fig.width=7, fig.height=4}
plot(y, type = "l", 
     main = "Example Time Series Data",
     xlab = "Time", ylab = "Value", 
     col = "steelblue", lwd = 0.8)
grid()
```

## A Simple Example

Suppose we want to test for 3 structural breaks, with coefficients in regimes 1 and 3 constrained to be equal and coefficients in regimes 2 and 4 constrained to be equal.

### Data Setup

```{r}
# Load the example data
data(example)
y <- example$y
bigt <- length(y)

# Regressors (intercept only in this example)
z <- matrix(1, nrow = bigt, ncol = 1)

# Number of regressors
q <- 1

# Number of breaks under the alternative
m <- 3
```

## Using Form A Restrictions

Form A writes the restrictions as $\delta = S\theta + s$, which is useful when the coefficients can be expressed in terms of a smaller set of free parameters.

### Specifying Restrictions (Form A)

For the restriction above, Form A can be written as:

```{r eval=FALSE}
# Form A: delta = S*theta + ss
# We have 4 regimes with 1 regressor each, so delta is 4x1
# We want: delta_1 = theta_1, delta_2 = theta_2, delta_3 = theta_1, delta_4 = theta_2
# This means we only have 2 free parameters (theta_1, theta_2)

S <- matrix(c(1, 0,    # delta_1 = theta_1
              0, 1,    # delta_2 = theta_2
              1, 0,    # delta_3 = theta_1
              0, 1),   # delta_4 = theta_2
            nrow = 4, ncol = 2, byrow = TRUE)
ss <- matrix(0, nrow = 4, ncol = 1)
```

### Conducting the Analysis (Form A)

```{r eval=FALSE}
result_formA <- mainp(
  m = m, q = q, z = z, y = y, 
  trm = 0.10,          # 10% trimming
  robust = 1,          # Use HAC standard errors
  prewhit = 0,         # No prewhitening
  hetvar = 1,          # Allow heteroskedastic variance
  S = S, ss = ss,      # Form A restriction
  R = NULL, rr = NULL, # Not used when forma=1
  doestim = 1,         # Perform estimation
  dotest = 1,          # Perform sup-F test
  docv = 0,            # Critical value simulation
  bigt = bigt,
  forma = 1            # Use Form A restrictions
)
```

Form A and Form B represent the same restriction in different parametrizations, so they should give the same result.

### Specifying Restrictions (Form B)

For Form B, specify the restriction matrix $R$ and vector $r$ such that $R\delta = r$:

```{r eval=FALSE}
# Restriction matrix: R*delta = rr
# delta = (delta_1, delta_2, delta_3, delta_4)
# Restrictions: delta_1 = delta_3, delta_2 = delta_4
R <- matrix(c(1, 0, -1, 0,   # delta_1 - delta_3 = 0
              0, 1,  0, -1),  # delta_2 - delta_4 = 0
            nrow = 2, byrow = TRUE)
rr <- c(0, 0)
```

### Conducting the Analysis (Form B)

Run the full analysis with `mainp()`:

```{r eval=FALSE}
result <- mainp(
  m = m, q = q, z = z, y = y, 
  trm = 0.10,          # 10% trimming
  robust = 1,          # Use HAC standard errors
  prewhit = 0,         # No prewhitening
  hetvar = 1,          # Allow heteroskedastic variance
  R = R, rr = rr,      # Form B restriction
  doestim = 1,         # Perform estimation
  dotest = 1,          # Perform sup-F test
  docv = 0,            # Critical value simulation
  bigt = bigt,
  formb = 1            # Use Form B restrictions
)
```

## Bootstrap Methods for Avoiding Local Minima

Estimation under restrictions may converge to local minima. The package provides two bootstrap-based refinements to improve the solution.

### The Problem of Local Minima

When estimating break dates under restrictions, the restricted sum of squared residuals may have multiple local minima. Standard iterative algorithms can therefore get trapped in suboptimal solutions.

### Bootstrap Restarting Break-Point (BRBP) Algorithm

The `brbp()` function implements the Bootstrap Restarting Break-Point algorithm using **resampling bootstrap**. Following Wood (2001), it repeatedly perturbs the problem and restarts the estimation to reduce the chance of settling at a spurious local minimum.

**Algorithm 1: Bootstrap Restarting Break-Point (BRBP) - Resampling Bootstrap**

**Step 1.** Assign an initial set of break dates $T^{(0)} = \{T^{(0)}_1,\dots,T^{(0)}_m\}$. This is typically the output from the `mainp()` function.

**Step 2.** Using $T^{(0)}$ as initial values, obtain the restricted parameter estimators $\hat{\theta}_0 = (\widehat T_{(0)}, \widehat\delta_{(0)})$.

**Step 3.** For $b = 1,\dots,B$ do:

-   **(3.1)** Generate a bootstrap sample $\{(y_t^{*}, Z_t^{*})\}_{t=1}^T$ and reorder it in chronological order.
-   **(3.2)** Using $\widehat T_{(b-1)}$ as initial break dates, estimate restricted structural breaks model to the bootstrap sample and obtain $(\widetilde T^{*}_{(b)}, \widetilde\delta^{*}_{(b)})$.
-   **(3.3)** Using $\widetilde T^{*}_{(b)}$ as initial break dates, re-estimate the model on the original sample and obtain $\tilde{\theta}_{(b)} = (\widetilde T_{(b)}, \widetilde\delta_{(b)})$.
-   **(3.4)** Compute $RSSR(\tilde{\theta}_{(b)})$ and compare it with $RSSR(\hat{\theta}_{(b-1)})$.
    -   **if** $RSSR(\tilde{\theta}_{(b)}) < RSSR(\hat{\theta}_{(b-1)})$ **then** set $\hat{\theta}_{(b)} = \tilde{\theta}_{(b)}$;
    -   **else** set $\hat{\theta}_{(b)} = \hat{\theta}_{(b-1)}$.

**Step 4.** Set the final estimator $(\widehat T, \widehat\delta) = (\widehat T_{(B)}, \widehat\delta_{(B)})$.

```{r eval=FALSE}
# Initial break dates (can be from unrestricted estimation or user-specified)
# For the example data, we use manually specified initial dates
T_init <- c(47, 64, 87)  # Initial break dates

# Run BRBP algorithm (Form B)
result_brbp <- brbp(
  y = y, z = z, q = q, m = m, bigt = bigt, trm = 0.10,
  T_init = T_init,
  B = 20,              # Number of bootstrap replications
  R = R, rr = rr,      # Form B restrictions
  verbose = FALSE      # Set to TRUE to see progress
)

# Results
breaks_brbp <- result_brbp$dx
coefficients_brbp <- result_brbp$delta
rssr_brbp <- result_brbp$rssr

# Track convergence
all_breaks <- result_brbp$all_break_dates  # All break dates across iterations
all_rssr <- result_brbp$all_rssr           # All RSSR values across iterations

# Plot convergence (if desired)
plot(all_rssr, type = "l", xlab = "Iteration", ylab = "RSSR", 
     main = "BRBP Convergence (Form B)")
```

#### Using Form A with BRBP

The same bootstrap algorithm can be used with Form A restrictions:

```{r eval=FALSE}
# Form A restrictions (same as defined earlier)
S <- matrix(c(1, 0, 1, 0, 0, 1, 0, 1), nrow = 4, ncol = 2, byrow = TRUE)
ss <- matrix(0, nrow = 4, ncol = 1)

# Run BRBP algorithm (Form A)
result_brbp_formA <- brbp(
  y = y, z = z, q = q, m = m, bigt = bigt, trm = 0.10,
  T_init = T_init,
  B = 20,              # Number of bootstrap replications
  S = S, ss = as.vector(ss),  # Form A restrictions
  R = NULL, rr = NULL,        # Not used for Form A
  verbose = FALSE
)

# Results
breaks_brbp_formA <- result_brbp_formA$dx
coefficients_brbp_formA <- result_brbp_formA$delta
rssr_brbp_formA <- result_brbp_formA$rssr

# Plot convergence
plot(result_brbp_formA$all_rssr, type = "l", xlab = "Iteration", ylab = "RSSR", 
     main = "BRBP Convergence (Form A)")
```

### Residual Bootstrap Restarting Break-Point (RBRBP) Algorithm

The `brbp_residual()` function implements the Residual Bootstrap Restarting Break-Point algorithm using **residual bootstrap**. This is an alternative method that preserves time ordering, also based on the bootstrap restarting framework of Wood (2001). The algorithm proceeds as follows:

**Algorithm 2: Residual Bootstrap Restarting Break-Point (RBRBP) - Residual Bootstrap**

**Step 1.** Assign an initial set of break dates $T^{(0)} = \{T^{(0)}_1,\dots,T^{(0)}_m\}$.

**Step 2.** Using $T^{(0)}$ as initial values, estimate restricted structural breaks model to obtain the estimators $\hat{\theta}_0 = (\widehat T_{(0)}, \widehat\delta_{(0)})$.

**Step 3.** For $b = 1,\dots,B$ do:

-   **(3.1)** Using $\hat{\theta}_{(b-1)} = (\widehat T_{(b-1)}, \widehat\delta_{(b-1)})$, compute residuals $\{e_t\}_{t=1}^T = \{y_t - Z_t(\widehat T_{(b-1)}) \widehat\delta_{(b-1)}\}_{t=1}^T$.
    -   Resample residuals with replacement to obtain $\{e_t^{*}\}_{t=1}^T$.
    -   Generate a bootstrap sample $\{y_t^{*}\}_{t=1}^T = \{Z_t(\widehat T_{(b-1)}) \widehat\delta_{(b-1)} + e_t^{*}\}_{t=1}^T$.
    -   Note: $Z_t$ remains unchanged (time ordering preserved).
-   **(3.2)** Using $\widehat T_{(b-1)}$ as initial break dates, estimate restricted structural breaks model to the bootstrap sample and obtain $(\widetilde T^{*}_{(b)}, \widetilde\delta^{*}_{(b)})$.
-   **(3.3)** Using $\widetilde T^{*}_{(b)}$ as initial break dates, re-estimate the model on the original sample and obtain $\tilde{\theta}_{(b)} = (\widetilde T_{(b)}, \widetilde\delta_{(b)})$.
-   **(3.4)** Compute $RSSR(\tilde{\theta}_{(b)})$ and compare it with $RSSR(\hat{\theta}_{(b-1)})$.
    -   **if** $RSSR(\tilde{\theta}_{(b)}) < RSSR(\hat{\theta}_{(b-1)})$ **then** set $\hat{\theta}_{(b)} = \tilde{\theta}_{(b)}$;
    -   **else** set $\hat{\theta}_{(b)} = \hat{\theta}_{(b-1)}$.

**Step 4.** Set the final estimator $(\widehat T, \widehat\delta) = (\widehat T_{(B)}, \widehat\delta_{(B)})$.

```{r eval=FALSE}
# Run RBRBP algorithm (Form B)
result_rbrbp <- brbp_residual(
  y = y, z = z, q = q, m = m, bigt = bigt, trm = 0.10,
  T_init = T_init,
  B = 20,              # Number of bootstrap replications
  R = R, rr = rr,      # Form B restrictions
  verbose = FALSE      # Set to TRUE to see progress
)

# Results
breaks_rbrbp <- result_rbrbp$dx
coefficients_rbrbp <- result_rbrbp$delta
rssr_rbrbp <- result_rbrbp$rssr

# Track convergence
rall_breaks <- result_rbrbp$all_break_dates  # All break dates across iterations
rall_rssr <- result_rbrbp$all_rssr           # All RSSR values across iterations

# Plot convergence (if desired)
plot(rall_rssr, type = "l", xlab = "Iteration", ylab = "RSSR", 
     main = "RBRBP Convergence (Form B)")
```

#### Using Form A with RBRBP

```{r eval=FALSE}
# Run RBRBP algorithm (Form A)
result_rbrbp_formA <- brbp_residual(
  y = y, z = z, q = q, m = m, bigt = bigt, trm = 0.10,
  T_init = T_init,
  B = 20,              # Number of bootstrap replications
  S = S, ss = as.vector(ss),  # Form A restrictions
  R = NULL, rr = NULL,        # Not used for Form A
  verbose = FALSE
)

# Results
breaks_rbrbp_formA <- result_rbrbp_formA$dx
coefficients_rbrbp_formA <- result_rbrbp_formA$delta
rssr_rbrbp_formA <- result_rbrbp_formA$rssr

# Plot convergence
plot(result_rbrbp_formA$all_rssr, type = "l", xlab = "Iteration", ylab = "RSSR", 
     main = "RBRBP Convergence (Form A)")
```

## Complete Workflow Example

We demonstrate a complete workflow using the example dataset, including bootstrap refinement.

### Step 1: Initial Estimation

```{r eval=FALSE}
# Load data
data(example)
y <- example$y
bigt <- length(y)
z <- matrix(1, nrow = bigt, ncol = 1)
q <- 1
m <- 3

# Form B Restrictions: delta_1 = delta_3, delta_2 = delta_4
R <- matrix(c(1, 0, -1, 0, 0, 1, 0, -1), nrow = 2, ncol = 4, byrow = TRUE)
rr <- c(0, 0)

# Initial estimation (Form B)
result_initial <- est2(y, z, q = q, m = m, bigt = bigt, trm = 0.10, R = R, rr = rr)
T_init <- result_initial$dx

cat("Initial break dates:", T_init, "\n")
```

Or using Form A:

```{r eval=FALSE}
# Form A Restrictions: same as Form B
S <- matrix(c(1, 0, 0, 1, 1, 0, 0, 1), nrow = 4, ncol = 2, byrow = TRUE)
ss <- matrix(0, nrow = 4, ncol = 1)

# Initial estimation (Form A)
result_initial_formA <- est(y, z, q = q, m = m, bigt = bigt, trm = 0.10, 
                            S = S, ss = ss)
T_init <- result_initial_formA$dx

cat("Initial break dates:", T_init, "\n")
```

### Step 2: Bootstrap Refinement

When initial estimates may be suboptimal, bootstrap methods can help:

#### Using Form B:

```{r eval=FALSE}
# Run BRBP algorithm (Form B)
result_brbp <- brbp(
  y = y, z = z, q = q, m = m, bigt = bigt, trm = 0.10,
  T_init = T_init,
  B = 20,              # Number of bootstrap replications
  R = R, rr = rr,
  verbose = FALSE
)

# Run RBRBP algorithm (Form B)
result_rbrbp <- brbp_residual(
  y = y, z = z, q = q, m = m, bigt = bigt, trm = 0.10,
  T_init = T_init,
  B = 20,
  R = R, rr = rr,
  verbose = FALSE
)

# Compare results
cat("Initial break dates:", T_init, "\n")
cat("BRBP break dates:", result_brbp$dx, "\n")
cat("RBRBP break dates:", result_rbrbp$dx, "\n")
cat("BRBP RSSR:", result_brbp$rssr, "\n")
cat("RBRBP RSSR:", result_rbrbp$rssr, "\n")
```

#### Using Form A:

```{r eval=FALSE}
# Run BRBP algorithm (Form A)
result_brbp_formA <- brbp(
  y = y, z = z, q = q, m = m, bigt = bigt, trm = 0.10,
  T_init = T_init,
  B = 20,
  S = S, ss = as.vector(ss),
  R = NULL, rr = NULL,
  verbose = FALSE
)

# Run RBRBP algorithm (Form A)
result_rbrbp_formA <- brbp_residual(
  y = y, z = z, q = q, m = m, bigt = bigt, trm = 0.10,
  T_init = T_init,
  B = 20,
  S = S, ss = as.vector(ss),
  R = NULL, rr = NULL,
  verbose = FALSE
)

# Compare results
cat("Initial break dates:", T_init, "\n")
cat("BRBP (Form A) break dates:", result_brbp_formA$dx, "\n")
cat("RBRBP (Form A) break dates:", result_rbrbp_formA$dx, "\n")
cat("BRBP (Form A) RSSR:", result_brbp_formA$rssr, "\n")
cat("RBRBP (Form A) RSSR:", result_rbrbp_formA$rssr, "\n")
```

## References

**Primary Reference:**

Perron, P., and Qu, Z. (2006). Estimating restricted structural change models. *Journal of Econometrics*, 134(2), 373-399.

**Related References:**

Bai, J., and Perron, P. (1998). Estimating and testing linear models with multiple structural changes. *Econometrica*, 66(1), 47-78.

Bai, J., and Perron, P. (2003). Computation and analysis of multiple structural change models. *Journal of Applied Econometrics*, 18(1), 1-22.

Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. *Econometrica*, 59(3), 817-858.

Wood, A. T. A. (2001). Minimizing Model Fitting Objectives That Contain Spurious Local Minima by Bootstrap Restarting. *Biometrics*, 57(1), 240-244.

## Session Info

```{r}
sessionInfo()
```