---
title: "epitraxr"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{epitraxr}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup, include = FALSE}
library(epitraxr)
```
# Introduction to epitraxr
EpiTrax is a central repository for epidemiological data developed by Utah State's Department of Health and Human Services (DHHS). It is now used by several other states. Through EpiTrax, public health officials have access to many different types of disease surveillance data, which they use to produce regular (e.g., weekly, monthly, annual) reports on their respective jurisdictions. This can be a tedious, time-intensive process.
The epitraxr package makes it fast and easy to process EpiTrax data and produce multiple reports.
## EpiTrax Data
To explore basic report functions in epitraxr, we'll use this sample dataset:
```{r sample-data}
data_file <- "vignette-data/epitrax_data.csv"
head(read.csv(data_file))
```
When you export data from EpiTrax, each row corresponds to a single disease case. EpiTrax can provide many different data points, but epitraxr only cares about three:
- `patient_disease`: The disease
- `patient_mmwr_week`: The week number (1-52) of disease onset
- `patient_mmwr_year`: The year of disease onset
The package ignores all other columns.
Read in the data with the `read_epitrax_data()` function:
```{r read_epitrax_data}
epitrax_data <- read_epitrax_data(data_file)
head(epitrax_data)
```
This validates the input data and converts the week number to a month number (1-12), because reports generally use months instead of weeks. The function also adds the `counts` column (initially all rows have a count of 1), which is used internally in manipulating the data while generating reports.
## Disease Lists
Before you can generate reports from the data, epitraxr needs a list of diseases to include in the report. Often, you'll have two lists, one for internal reports and one for public reports. Read these files in using the functions `get_report_diseases_internal()` and `get_report_diseases_public()`.
```{r get-ilist}
internal_disease_list <- "vignette-data/ireport_diseases.csv"
internal_diseases <- get_report_diseases_internal(internal_disease_list)
head(internal_diseases)
```
`internal_diseases` has two columns. `EpiTrax_name` is the disease name *as reported by EpiTrax*. All internal reports need the values in this column. `Group_name` is for grouping diseases, which is only used by `create_report_grouped_stats()`. If you aren't creating grouped reports, you don't need `Group_name` in your internal disease list.
```{r get-plist}
public_disease_list <- "vignette-data/preport_diseases.csv"
public_diseases <- get_report_diseases_public(public_disease_list)
head(public_diseases)
```
`public_diseases` also has two columns. Like `internal_diseases`, `EpiTrax_name` is the disease name *as reported by EpiTrax*. All public reports need the values in this column to properly compute statistics from the data. `Public_name` is used by certain functions (prefixed with `create_public_report_`) to translate the EpiTrax disease name to something more accessible to the public. `Public_name` is also used to combine related diseases in the final report (e.g., "Syphilis, primary" and "Syphilis, secondary" publicly reported by the collected statistic of "Syphilis").
## Generating Reports: Standard Mode
We can now call the report generation functions, such as `create_report_annual_counts()`, providing the list of diseases we want to include in our report.
```{r annual-report-1}
report <- create_report_annual_counts(
data = epitrax_data,
diseases = internal_diseases$EpiTrax_name
)
head(report)
```
This gives us a data frame containing a row for each disease in our disease list and a column showing the case counts for each year in the dataset.
Let's call the report function again, but this time give it the public disease list.
```{r annual-report}
report <- create_report_annual_counts(
data = epitrax_data,
diseases = public_diseases$EpiTrax_name
)
head(report)
```
## Generating Reports: Piped Mode (recommended)
The epitraxr package includes a separate piped mode to make it easy to chain together multiple reports without needing to specify the disease list and input data each time. **This is our recommended mode for epitraxr.** See `vignette("piped-mode")` for more information.
Here is a brief example of how the same annual counts report generation would work in piped mode.
```{r annual-counts-piping}
# Data and configuration files
data_file <- "vignette-data/epitrax_data.csv"
config_file <- "vignette-data/config.yaml"
disease_lists <- list(
internal = "vignette-data/ireport_diseases.csv",
public = "vignette-data/preport_diseases.csv"
)
# Run pipe
epitrax <- create_epitrax_from_file(data_file) |>
epitrax_set_config_from_file(config_file) |>
epitrax_set_report_diseases(disease_lists) |>
epitrax_ireport_annual_counts()
# View report
head(epitrax$internal_reports$annual_counts)
```
Piped mode really shines when we're creating multiple reports all at once.
```{r many-reports-piping}
epitrax <- create_epitrax_from_file(data_file) |>
epitrax_set_config_from_file(config_file) |>
epitrax_set_report_diseases(disease_lists) |>
epitrax_ireport_annual_counts() |>
epitrax_ireport_monthly_avgs() |>
epitrax_ireport_ytd_counts_for_month()
list(epitrax$internal_reports)
```