--- title: "sqlcaseR: Building long CASE WHEN statements for SQL interfaces in R" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{sqlcaseR: Building long CASE WHEN statements for SQL interfaces in R} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` **Version 0.2.1** ***Leoson Hoay
Last Updated: 8 Sep 2025 (0.2.0 -> 0.2.1)*** ## Introduction This module was born out of my genuine frustration while constructing an extremely long CASE WHEN...THEN statement to re-label categorical variables. It is most helpful for folks who intend to work with SQL directly in the R environment, likely with a SQL connector such as RODBC or RSQLite. The package provides three main functions for generating SQL statements: - **`casewhen()`** - Creates CASE WHEN...THEN statements for value mapping - **`inlist()`** - Creates IN() statements for filtering - **`updatetable()`** - Creates UPDATE statements for bulk data updates All functions now support flexible column selection, robust error handling, different quote types, and various options for handling NULL/NA values.
Go from this: | | | |--------------------------------|-------------------------------| | Hotel/Motel | Living in Shelter/Hotel/Motel | | Homeless Shelter | Living in Shelter/Hotel/Motel | | Homeless Status Not Applicable | Not Homeless | | N/A | Not Homeless | | No | Not Homeless | | Homeless, Doubled-Up | Doubled Up | To this: ```{} CASE WHEN 'Hotel/Motel' THEN 'Living in Shelter/Hotel/Motel' WHEN 'Homeless Shelter' THEN 'Living in Shelter/Hotel/Motel' WHEN 'Homeless Status Not Applicable' THEN 'Not Homeless' WHEN 'N/A' THEN 'Not Homeless' WHEN 'No' THEN 'Not Homeless' WHEN 'Homeless, Doubled-Up' THEN 'Doubled Up' ``` The package also supports the creation of long SQL IN() lists via the *inlist()* function, and UPDATE statements via the *updatetable()* function. The IN() functionality was inspired by reading about Kevin Flerlage's [Excel implementation](https://www.flerlagetwins.com/2020/09/in-operator-generator-case-statement.html). ## Demonstration ```{r setup} library(sqlcaser) ``` The package assumes that the user has a mapping CSV file or an R dataframe similar to the example below: ```{r} samp <- system.file("extdata", "sample.csv", package = "sqlcaser") mapping <- read.csv(samp) mapping ``` The function **casewhen()** takes an ***R dataframe or the file path of the mapping file*** as input,and returns the CASE statement as a string, while printing it to the console as well. ```{r} statement <- casewhen(samp) ``` The user can then easily include it as part of the SQL query: ```{r} query <- paste("SELECT id, ", statement, " END AS status "," \nFROM table;") cat(query) ``` ## Advanced Examples ### Using Named Columns and ELSE Clause ```{r} # Create sample data with named columns data <- data.frame( status = c("Active", "Inactive", "Pending"), display = c("Currently Active", "Not Active", "Under Review") ) # Use column names instead of positions, add ELSE clause advanced_case <- casewhen(data, when_col = "status", then_col = "display", else_value = "Unknown Status") ``` ### Working with Numeric Data and Auto-Quoting ```{r} # Create data with numeric values scores <- data.frame( grade = c("A", "B", "C"), points = c(90, 80, 70) ) # Auto-quote mode handles numeric values without quotes numeric_case <- casewhen(scores, when_col = "grade", then_col = "points", quote_type = "auto") ``` ### Creating IN Lists with Duplicate Removal ```{r} # Create sample data with duplicates categories <- data.frame( category = c("Sales", "Marketing", "Sales", "IT", "Marketing", "HR") ) # Remove duplicates automatically unique_list <- inlist(categories, value_col = "category", distinct = TRUE) ``` ### Bulk UPDATE Statements ```{r} # Create sample update data updates <- data.frame( id = c(1, 2, 3), name = c("John Doe", "Jane Smith", "Bob Wilson"), department = c("Engineering", "Sales", "Marketing"), salary = c(75000, 65000, 60000) ) # Generate UPDATE statements using ID as key update_statements <- updatetable(updates, tablename = "employees", key_col = "id") ``` ## Sample Data A sample mapping file is provided in this package. The file path can be accessed as follows: ```{r} samplepath <- system.file("extdata", "sample.csv", package = "sqlcaser") ``` ## Function Reference ### `casewhen()` - CASE Statement Generator **Description** Constructs CASE WHEN...THEN statements from mapping data with flexible column selection and robust options. **Usage** ```r casewhen(inputfile = NULL, header = FALSE, when_col = 1, then_col = 2, else_value = NULL, quote_type = "single", handle_nulls = "skip") ``` **Arguments** - `inputfile` - R dataframe or path to mapping file - `when_col` - Column name/index for WHEN values (default: 1) - `then_col` - Column name/index for THEN values (default: 2) - `else_value` - Optional ELSE value for the CASE statement - `quote_type` - "single", "double", or "auto" (default: "single") - `handle_nulls` - "skip", "null", or "error" (default: "skip") ### `inlist()` - IN Statement Generator **Description** Creates SQL IN() statements with duplicate removal and flexible column selection. **Usage** ```r inlist(inputfile = NULL, header = FALSE, value_col = 1, quote_type = "single", handle_nulls = "skip", distinct = TRUE) ``` **Arguments** - `inputfile` - R dataframe or path to mapping file - `value_col` - Column name/index for IN values (default: 1) - `quote_type` - "single", "double", or "auto" (default: "single") - `distinct` - Remove duplicates if TRUE (default: TRUE) ### `updatetable()` - UPDATE Statement Generator **Description** Generates bulk UPDATE statements with flexible column selection and key specification. **Usage** ```r updatetable(inputfile = NULL, tablename = NULL, key_col = 1, update_cols = NULL, quote_type = "auto", handle_nulls = "skip", batch_updates = TRUE) ``` **Arguments** - `inputfile` - R dataframe or path to mapping file - `tablename` - Name of SQL table to update - `key_col` - Column name/index for WHERE clause (default: 1) - `update_cols` - Specific columns to update (default: all except key_col) - `quote_type` - "single", "double", or "auto" (default: "auto") ## Common Parameters Explained **Quote Types:** - `"single"` - Use single quotes for all values: `'value'` - `"double"` - Use double quotes for all values: `"value"` - `"auto"` - Smart quoting: numeric values unquoted, text values single-quoted **NULL Handling:** - `"skip"` - Skip rows/values with NA (default, safest) - `"null"` - Convert NA to SQL NULL in output - `"error"` - Stop execution when NA encountered ## Installation Install using: ```{} devtools::install_github("leosonh/sqlcaseR") ``` ## Acknowledgments Much thanks to a couple of my prior colleagues at [Learning Collider](https://www.learningcollider.org/) - Nitya Raviprakash and Jasmin Dial - who provided healthy discussion around my misery of constructing long SQL queries. Credit is also due to Kevin Flerlage, whose efforts in automating this process in Excel are commendable and partially inspired this package. ## Citation and License If desired, cite the package using: ```{} citation("sqlcaseR") ``` License: MIT License