--- title: "Getting the Most out of DAGassist Using Parameters" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Get Started} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r dev-load, include=FALSE} # Prefer source build when available (works in RStudio, pkgdown, or local render) if (requireNamespace("devtools", quietly = TRUE) && file.exists(file.path("..","DESCRIPTION"))) { # Don't error on CRAN/build machines that don't have devtools or the source path try(devtools::load_all("..", quiet = TRUE), silent = TRUE) } # If we've already loaded from source, avoid re-attaching a different installed build later from_source <- try({ "DAGassist" %in% loadedNamespaces() && grepl(normalizePath(".."), getNamespaceInfo(asNamespace("DAGassist"), "path"), fixed = TRUE) }, silent = TRUE) from_source <- isTRUE(from_source) # Feature gates (computed *after* attempting load_all) has_show <- tryCatch({ "show" %in% names(formals(DAGassist::DAGassist)) }, error = function(e) FALSE) # Robust check: dev build defines a private .report_dotwhisker helper has_dotwhisker <- tryCatch({ exists(".report_dotwhisker", envir = asNamespace("DAGassist"), inherits = FALSE) }, error = function(e) FALSE) ``` ```{r ex-dag, include=FALSE} library(dagitty) library(ggdag) dag_model <- dagify( Y ~ X + M + Z + A + B, X ~ Z, C ~ X + Y, M ~ X, exposure = "X", outcome = "Y" ) set.seed(42) n <- 2000 #exogenous variables A <- rnorm(n, 0, 1) B <- rnorm(n, 0, 1) Z <- rnorm(n, 0, 1) #structural equations # X ~ Z beta_zx <- 0.8 X <- beta_zx * Z + rnorm(n, 0, 1) # M ~ X beta_xm <- 0.9 M <- beta_xm * X + rnorm(n, 0, 1) # Y ~ X + M + Z + A + B bX <- 0.7; bM <- 0.6; bZ <- 0.3; bA <- 0.2; bB <- -0.1 Y <- bX*X + bM*M + bZ*Z + bA*A + bB*B + rnorm(n, 0, 1) # C ~ X + Y bXC <- 0.5; bYC <- 0.4 C <- bXC*X + bYC*Y + rnorm(n, 0, 1) reg_levels <- c("North", "South", "East", "West") region <- factor(sample(reg_levels, n, replace = TRUE)) df <- data.frame(A, B, Z, X, M, Y, C, region) ``` # Introduction `DAGassist()` is meant to be simple and easy to use, and most of its features can be enjoyed via a simple two-parameter argument: ```{r example, eval=FALSE} library(DAGassist) library(dagitty) DAGassist( dag = your_dag_model, formula = your_regression_call ) ``` But it also offers several parameters for more specific applications. They control how the DAG is evaluated (`imply`, `eval_all`), how results print (`show`, `labels`, `omit_factors`, `omit_intercept`, `verbose`), which modeling engine to use (`engine`, `engine_args`), and which output format to write (`type`, `out`). This vignette walks through each with small examples. # Core Arguments ## `dag` and `formula` `formula` can be a standard `formula + data` regression call, from which `DAGassist` will impute the necessary information, or three separate `formula`, `data`, and `engine` arguments. ```{r formula, eval=FALSE} #imputed formula DAGassist( #implies the exposure and outcome from the dagitty object dag = dag_model, #implies the engine, formula, and data from the regression call formula = lm(Y ~ X + C, data=df) ) #plain formula DAGassist( dag = dag_model, engine = stats::lm, #stats::lm is the default engine arg formula = Y ~ X + C, data = df, exposure = "X", outcome = "Y" ) ``` The two formulas above will print identical output. # Scope Flags ## `imply`: evaluate on only mentioned variables vs the full DAG - `imply = FALSE` (default): prune the DAG to just exposure, outcome, and your RHS variables; roles/sets are computed on this pruned graph. - `imply = TRUE`: evaluate on the full DAG and allow DAG-implied controls to enter minimal/canonical sets (you’ll be told what’s added). ```{r imply-demo} #pruned-to-formula DAG DAGassist(dag = dag_model, formula = Y ~ X + C, data = df, imply = FALSE, show = "roles") #full-DAG evaluation DAGassist(dag = dag_model, formula = Y ~ X + C, data = df, imply = TRUE, show = "roles") ``` ## `eval_all`: keep non-DAG RHS terms in derived models Sometimes your RHS has terms that aren’t DAG nodes (e.g., fixed effects via `i(region)`, factor expansions, interactions, splines). `eval_all` decides whether these non-DAG terms are kept in minimal/canonical formulas. - eval_all = FALSE (default): drop RHS terms not present as DAG nodes from the derived formulas. - eval_all = TRUE: keep all original RHS terms that aren’t DAG nodes (e.g., fixed effects), in addition to the DAG-based controls. ```{r omit, eval=FALSE} DAGassist( dag = dag_model, formula = fixest::feols(Y ~ X + C + fixest::i(region), data = df), imply = TRUE, eval_all = TRUE ) ``` # Display and Labeling ## `show`: sub-reports - "all" (default): roles grid + model comparison - "roles": just the roles/flags table - "models": just the model comparison ```{r show-demo, eval=FALSE} # just the roles table DAGassist(dag = dag_model, formula = Y ~ X + Z + C, data = df, show = "roles") #just the model comparison DAGassist(dag = dag_model, formula = Y ~ X + Z + C, data = df, show = "models") ``` ## `labels`: human-readable names Provide a named character vector or a small data frame. Note that the `label` parameter uses `modelsummary()` `coef_rename` logic, so an incomplete label list will not throw any errors. ```{r labels} labs <- list( X = "Exposure", Y = "Outcome", C = "Collider" ) DAGassist( dag = dag_model, formula = lm(Y ~ X + C, data = df), show = "roles", labels = labs ) ``` ## `omit_intercept` and `omit_factors`: output-only filters These flags only suppress rows in the printed model comparison. They do not remove terms from estimation. `omit_factors` in particular is useful for conserving space in your report, as reports with factors included can be hundreds of rows. ```{r omit-demo, eval=FALSE} DAGassist( dag = dag_model, formula = fixest::feols(Y ~ X + Z + i(region), data = df), omit_intercept = TRUE, omit_factors = TRUE # both TRUE by default ) ``` ## `bivariate`: include a no-covariate comparison column Include a `Y ~ X` column for readers who want the raw association. `bivariate = FALSE` by default. ```{r bivariate} DAGassist( dag = dag_model, formula = lm(Y ~ X + C, data = df), show = "models", bivariate = TRUE ) ``` ## `verbose`: printing formulas & notes `verbose` = TRUE (default) prints helpful notes (what was added/dropped, derived formulas). Set to FALSE for a quieter console. ```{r verbose-demo, eval=FALSE} DAGassist(dag = dag_model, formula = Y ~ X + Z + C, data = df, verbose = FALSE) ``` # Parameter Reference Table | Parameter | Type | Default | What it does | |:----------------:|:-------------------------:|:---------:|:--------------| | `dag` | dagitty object | — | The DAG to validate and evaluate. | | `formula` | formula or single call | — | Either `Y ~ X + ...` or a single engine call like `feols(...)`. | | `data` | data.frame | — | Required unless supplied in engine call. | | `engine` | function | `stats::lm` | Modeling function (ignored if `formula` is a call). | | `engine_args` | named list | `list()` | Extra args for `engine(...)`; merged with call args (call wins). | | `verbose` | logical | `TRUE` | Print formulas & notes in console. | | `type` | string | `"console"` | One of `"console"`, `"latex"`, `"docx"/"word"`, `"xlsx"/"excel"`, `"text"/"txt"`. | | `out` | path | — | Output path for non-console types. | | `imply` | logical | `FALSE` | Scope: pruned-to-formula vs full-DAG evaluation. | | `labels` | named chr / data.frame | `NULL` | Rename coefficients (modelsummary `coef_rename` logic). | | `omit_intercept` | logical | `TRUE` | Hide intercept in printed comparison. | | `omit_factors` | logical | `TRUE` | Hide factor levels in printed comparison. | | `show` | string | `"all"` | `"all"`, `"roles"`, or `"models"`. | | `eval_all` | logical | `FALSE` | Keep non-DAG RHS terms (FEs, splines, interactions) in derived models. |