---
title: "sqlcaseR: Building long CASE WHEN statements for SQL interfaces in R"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{sqlcaseR: Building long CASE WHEN statements for SQL interfaces in R}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
**Version 0.2.1**
***Leoson Hoay
Last Updated: 8 Sep 2025 (0.2.0 -> 0.2.1)***
## Introduction
This module was born out of my genuine frustration while constructing an
extremely long CASE WHEN...THEN statement to re-label categorical variables.
It is most helpful for folks who intend to work with SQL directly in the R
environment, likely with a SQL connector such as RODBC or RSQLite.
The package provides three main functions for generating SQL statements:
- **`casewhen()`** - Creates CASE WHEN...THEN statements for value mapping
- **`inlist()`** - Creates IN() statements for filtering
- **`updatetable()`** - Creates UPDATE statements for bulk data updates
All functions now support flexible column selection, robust error handling,
different quote types, and various options for handling NULL/NA values.
Go from this:
| | |
|--------------------------------|-------------------------------|
| Hotel/Motel | Living in Shelter/Hotel/Motel |
| Homeless Shelter | Living in Shelter/Hotel/Motel |
| Homeless Status Not Applicable | Not Homeless |
| N/A | Not Homeless |
| No | Not Homeless |
| Homeless, Doubled-Up | Doubled Up |
To this:
```{}
CASE WHEN 'Hotel/Motel' THEN 'Living in Shelter/Hotel/Motel'
WHEN 'Homeless Shelter' THEN 'Living in Shelter/Hotel/Motel'
WHEN 'Homeless Status Not Applicable' THEN 'Not Homeless'
WHEN 'N/A' THEN 'Not Homeless'
WHEN 'No' THEN 'Not Homeless'
WHEN 'Homeless, Doubled-Up' THEN 'Doubled Up'
```
The package also supports the creation of long SQL IN() lists via the *inlist()*
function, and UPDATE statements via the *updatetable()* function. The IN() functionality
was inspired by reading about Kevin Flerlage's
[Excel implementation](https://www.flerlagetwins.com/2020/09/in-operator-generator-case-statement.html).
## Demonstration
```{r setup}
library(sqlcaser)
```
The package assumes that the user has a mapping CSV file or an R dataframe
similar to the example below:
```{r}
samp <- system.file("extdata", "sample.csv", package = "sqlcaser")
mapping <- read.csv(samp)
mapping
```
The function **casewhen()** takes an ***R dataframe or the file path of the
mapping file*** as input,and returns the CASE statement as a string, while printing
it to the console as well.
```{r}
statement <- casewhen(samp)
```
The user can then easily include it as part of the SQL query:
```{r}
query <- paste("SELECT id, ", statement, " END AS status "," \nFROM table;")
cat(query)
```
## Advanced Examples
### Using Named Columns and ELSE Clause
```{r}
# Create sample data with named columns
data <- data.frame(
status = c("Active", "Inactive", "Pending"),
display = c("Currently Active", "Not Active", "Under Review")
)
# Use column names instead of positions, add ELSE clause
advanced_case <- casewhen(data,
when_col = "status",
then_col = "display",
else_value = "Unknown Status")
```
### Working with Numeric Data and Auto-Quoting
```{r}
# Create data with numeric values
scores <- data.frame(
grade = c("A", "B", "C"),
points = c(90, 80, 70)
)
# Auto-quote mode handles numeric values without quotes
numeric_case <- casewhen(scores,
when_col = "grade",
then_col = "points",
quote_type = "auto")
```
### Creating IN Lists with Duplicate Removal
```{r}
# Create sample data with duplicates
categories <- data.frame(
category = c("Sales", "Marketing", "Sales", "IT", "Marketing", "HR")
)
# Remove duplicates automatically
unique_list <- inlist(categories,
value_col = "category",
distinct = TRUE)
```
### Bulk UPDATE Statements
```{r}
# Create sample update data
updates <- data.frame(
id = c(1, 2, 3),
name = c("John Doe", "Jane Smith", "Bob Wilson"),
department = c("Engineering", "Sales", "Marketing"),
salary = c(75000, 65000, 60000)
)
# Generate UPDATE statements using ID as key
update_statements <- updatetable(updates,
tablename = "employees",
key_col = "id")
```
## Sample Data
A sample mapping file is provided in this package. The file path can be accessed
as follows:
```{r}
samplepath <- system.file("extdata", "sample.csv", package = "sqlcaser")
```
## Function Reference
### `casewhen()` - CASE Statement Generator
**Description**
Constructs CASE WHEN...THEN statements from mapping data with flexible column selection and robust options.
**Usage**
```r
casewhen(inputfile = NULL, header = FALSE, when_col = 1, then_col = 2,
else_value = NULL, quote_type = "single", handle_nulls = "skip")
```
**Arguments**
- `inputfile` - R dataframe or path to mapping file
- `when_col` - Column name/index for WHEN values (default: 1)
- `then_col` - Column name/index for THEN values (default: 2)
- `else_value` - Optional ELSE value for the CASE statement
- `quote_type` - "single", "double", or "auto" (default: "single")
- `handle_nulls` - "skip", "null", or "error" (default: "skip")
### `inlist()` - IN Statement Generator
**Description**
Creates SQL IN() statements with duplicate removal and flexible column selection.
**Usage**
```r
inlist(inputfile = NULL, header = FALSE, value_col = 1,
quote_type = "single", handle_nulls = "skip", distinct = TRUE)
```
**Arguments**
- `inputfile` - R dataframe or path to mapping file
- `value_col` - Column name/index for IN values (default: 1)
- `quote_type` - "single", "double", or "auto" (default: "single")
- `distinct` - Remove duplicates if TRUE (default: TRUE)
### `updatetable()` - UPDATE Statement Generator
**Description**
Generates bulk UPDATE statements with flexible column selection and key specification.
**Usage**
```r
updatetable(inputfile = NULL, tablename = NULL, key_col = 1, update_cols = NULL,
quote_type = "auto", handle_nulls = "skip", batch_updates = TRUE)
```
**Arguments**
- `inputfile` - R dataframe or path to mapping file
- `tablename` - Name of SQL table to update
- `key_col` - Column name/index for WHERE clause (default: 1)
- `update_cols` - Specific columns to update (default: all except key_col)
- `quote_type` - "single", "double", or "auto" (default: "auto")
## Common Parameters Explained
**Quote Types:**
- `"single"` - Use single quotes for all values: `'value'`
- `"double"` - Use double quotes for all values: `"value"`
- `"auto"` - Smart quoting: numeric values unquoted, text values single-quoted
**NULL Handling:**
- `"skip"` - Skip rows/values with NA (default, safest)
- `"null"` - Convert NA to SQL NULL in output
- `"error"` - Stop execution when NA encountered
## Installation
Install using:
```{}
devtools::install_github("leosonh/sqlcaseR")
```
## Acknowledgments
Much thanks to a couple of my prior colleagues at [Learning Collider](https://www.learningcollider.org/) - Nitya Raviprakash
and Jasmin Dial - who provided healthy discussion around my misery of
constructing long SQL queries. Credit is also due to Kevin Flerlage, whose
efforts in automating this process in Excel are commendable and partially
inspired this package.
## Citation and License
If desired, cite the package using:
```{}
citation("sqlcaseR")
```
License: MIT License