---
title: "Getting started with TestGenerator"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting started with TestGenerator}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

TestGenerator helps you test pharmacoepidemiological study code against a
small, explicit OMOP CDM test population. The typical workflow is:

1. Create a small patient dataset in Excel or CSV files.
2. Convert that dataset to a Unit Test Definition JSON file.
3. Load the JSON into a blank CDM.
4. Run your study code and assert the expected results.

This vignette uses the ICU sample population included with the package.

## Create a Unit Test Definition

An Excel input file should contain one sheet per OMOP CDM table. For example,
the sheet names can include `person`, `observation_period`, `visit_occurrence`,
`condition_occurrence`, `drug_exposure`, and `measurement`.

```{r create-json}
library(TestGenerator)

file_path <- system.file(
  "extdata",
  "icu_sample_population.xlsx",
  package = "TestGenerator"
)

output_path <- file.path(tempdir(), "testgenerator-example")
dir.create(output_path, showWarnings = FALSE, recursive = TRUE)

readPatients(
  filePath = file_path,
  testName = "icu_sample",
  outputPath = output_path,
  cdmVersion = "5.4"
)
```

This writes `icu_sample.json` to `output_path`. Keeping these JSON files in
`tests/testthat/testCases` makes them easy to reuse from package tests. When
`outputPath = NULL`, TestGenerator writes to that default test case folder.

## Load the Test Population into a CDM

Use `patientsCDM()` to create a CDM reference containing the small patient
population and a complete vocabulary. By default, the CDM is created in DuckDB.

```{r load-cdm}
cdm <- patientsCDM(
  pathJson = output_path,
  testName = "icu_sample",
  cdmVersion = "5.4"
)

cdm[["person"]]
```

If `pathJson = NULL`, TestGenerator looks for JSON files in
`tests/testthat/testCases`.

```{r default-test-path}
cdm <- patientsCDM(
  pathJson = NULL,
  testName = "icu_sample",
  cdmVersion = "5.4"
)
```

## Use the CDM in Unit Tests

Once the test CDM is available, run the same study code you use on a real CDM.
The package includes example cohort definitions under `inst/extdata/test_cohorts`.

```{r cohort-test}
library(CDMConnector)
library(dplyr)
library(testthat)

test_cohorts <- system.file(
  "extdata",
  "test_cohorts",
  package = "TestGenerator"
)

cohort_set <- readCohortSet(test_cohorts)

cdm <- generateCohortSet(
  cdm = cdm,
  cohortSet = cohort_set,
  name = "test_cohorts"
)

cohort_attrition <- attrition(cdm[["test_cohorts"]])

excluded_records <- cohort_attrition |>
  pull(excluded_records) |>
  sum()

expect_equal(excluded_records, 0)
```

In a package test, place this code in `tests/testthat/test-*.R` and assert the
specific counts, dates, durations, or intersections that your study should
produce for the micro population.

## Start from a Blank Excel Template

If you want to design a new test population from scratch, create an Excel
workbook with the required CDM table columns.

```{r generate-template}
generateTestTables(
  tableNames = c(
    "person",
    "observation_period",
    "visit_occurrence",
    "condition_occurrence",
    "drug_exposure",
    "measurement"
  ),
  cdmVersion = "5.4",
  outputFolder = output_path,
  filename = "my_test_population"
)
```

Fill in the workbook rows for the small set of patients needed by your test,
then pass the completed workbook to `readPatients()`.

## CSV Inputs

For CSV inputs, place one file per CDM table in a folder. File names should
match the table names, for example `person.csv` and `observation_period.csv`.

```{r csv-input}
csv_path <- system.file(
  "extdata",
  "mimic_sample",
  package = "TestGenerator"
)

readPatients.csv(
  filePath = csv_path,
  testName = "mimic_sample",
  outputPath = output_path,
  cdmVersion = "5.4"
)
```

For source datasets with very large integer identifiers, set
`reduceLargeIds = TRUE`.

## Clean Up

For local DuckDB examples, disconnect when the test has finished.

```{r cleanup}
DBI::dbDisconnect(CDMConnector::cdmCon(cdm), shutdown = TRUE)
unlink(output_path, recursive = TRUE)
```