--- title: "2. Visualize: Charts" author: "David Gerbing" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{2. Visualize: Charts} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r include=FALSE} suppressPackageStartupMessages(library("lessR")) ``` ```{r, include=FALSE} knitr::opts_chunk$set(fig.width=5.5, fig.height=4) ``` ## Data ### Read Most of the following examples analyze data in the _Employee_ data set, included with __lessR__. To read an internal __lessR__ data set, just pass the name of the data set to the __lessR__ function `Read()`. Read the _Employee_ data into the data frame _d_. See the `Read and Write` vignette for more details. ```{r read} d <- Read("Employee") ``` ## Chart of One Variable ### Bar Chart One of the most frequently encountered visualizations is the bar chart, created for the values of a categorical variable that are each associated with a corresponding value of a numerical variable. >_Bar chart_: Plot a bar for each level of a categorical variable with its height scaled according to the value of an associated numerical variable. A call to the chart function contains, at a minimum, the name of the categorical variable with the categories to be plotted. With the `Chart()` function, that variable name is the first argument passed to the function. In this example, the _only_ argument passed to the function is the variable name as the data frame is named _d_, the __lessR__ default value. Or, specify the data frame that contains the variable(s) of interest with the `data` parameter. The following illustrates the call to `Chart()` with a categorical variable named $x$. ```{r bc1, dataTable, echo=FALSE, out.width='30%', fig.asp=.7, fig.align='center', out.extra='style="border-style: none"'} knitr::include_graphics(system.file("img", "bcExplain.png", package="lessR")) ``` If only a single categorical variable is passed to `Chart()`, the numerical value associated with each bar is the corresponding count of the number of occurrences, automatically computed. Consider the categorical variable Dept in the Employee data table, with one of five departments recorded for each employee. Use `Chart()` to tabulate and display the visualization of the number of employees in each department, here relying upon the default data frame (table) named _d_. Otherwise add the `data=` option for a data frame with another name. ```{r bcEx, fig.width=4, fig.height=3.5, fig.align='center', fig.cap="Bar chart of tablulated counts of employees in each department."} Chart(Dept) ``` The default color theme, `"colors"`, fills the bars in the bar chart with with different hues. See more explanation of this and related color palettes in the vignette _Customize_. `Chart()` also labels each bar with the associated numerical value. The function provides the corresponding frequency distribution, the table that lists the count of each category, from which the bar chart is constructed. For more examples, we do not need to see this output to the R console repeated again for different charts of the same data, so turn off for now with the parameter `quiet` set to `TRUE`. You can set this option for each call to `Chart()`, or you can set as the default for subsequent analyses with the `style()` function. ```{r echo=FALSE} style(quiet=TRUE) ``` Following are other chart possibilities. Indicate a specific form with the parameter `type`. The default is `type="bar"` for a bar chart. ### Pie Chart An alternative to the bar chart for a single categorical variable is the pie chart. > _Pie Chart_: Relate each level of a categorical variable to the area of a circle (pie) scaled according to the value of an associated numerical variable. The __lessR__ default version of a pie chart is the doughnut or ring chart. ```{r, echo=FALSE, include=FALSE} d <- Read("Employee") ``` ```{r pc1, eval=FALSE} Chart(Dept, type="pie") ``` ```{r pc1run, echo=FALSE, fig.align='center', fig.width=3.5, fig.height=3.5} p <- Chart(Dept, type="pie") p ``` The doughnut or ring chart appears easier to read than a standard pie chart. But `Chart()` also can create the "old-fashioned" pie chart by setting the value of parameter `hole` to `0`. We have seen the summary statistics several times now, so turn off the output to the R console here with the `quiet` parameter. ```{r hole0, eval=FALSE} Chart(Dept, hole=0, type="pie") ``` ```{r hole0run, echo=FALSE, fig.width=3.5, fig.height=3.5, fig.align='center', fig.cap="Standard pie chart of variable Dept in the _d_ data frame."} p <- Chart(Dept, hole=0, type="pie") p ``` Set the size of the hole in the doughnut or ring chart with the parameter `hole`, which specifies the proportion of the pie occupied by the hole. The default hole size is 0.65. Set that value to 0 to close the hole. ### Other Hierarchical Charts Some charts provide the perspective of the whole divided into its component pieces. The previously presented pie chart is one such example. These charts are called hierarchical charts or part-whole visualizations. Other possibilities are the treemap and icicle charts. ```{r egTM, eval=FALSE} Chart(Dept, type="treemap") ``` ```{r egTMrun, echo=FALSE, fig.cap="Treemap of count of the number of employees in each department."} p <- Chart(Dept, type="treemap") p ``` ```{r egIce, eval=FALSE} Chart(Dept, type="icicle") ``` ```{r egIceRun, echo=FALSE, fig.cap="Icicle chart of count of the number of employees in each department."} p <- Chart(Dept, type="icicle") p ``` ### Other Charts ```{r egRadar, eval=FALSE,} Chart(Dept, type="radar") ``` ```{r radarEx, echo=FALSE, fig.cap="Radar chart of the count of the number of employees in each department."} p <- Chart(Dept, type="radar") p ``` ```{r bubEx, echo=FALSE, fig.cap="Bubble chart of the count of the number of employees in each department."} p <- Chart(Dept, type="bubble") p ``` ## Specify the Numerical Variable One possibility begins with the values of the $x$ and $y$ variables, such as in a table, and then create the bar chart directly from this summary table. To do so, enter the paired data values into a data file such as with Excel, and then read into R with `Read()`. When calling `Chart()`, specify the categorical $x$ variable and then the numerical $y$ variable. When the numeric variable is specified, the data are a summary (pivot) table, with one row for each level of the categorical variable plotted. For example, suppose a summary table contains the departments and the mean salary for each department. Obtain the summary table with the __lessR__ `pivot()` function (which has its own vignette). For the data frame _d_, calculate the mean of numerical variable _Salary_ across levels of the categorical variable Dept. ```{r a} a <- pivot(d, mean, Salary, Dept) a ``` The general syntax follows for processing this form of the data follows. ```{r bcXY, dataTable, echo=FALSE, out.width='35%', fig.asp=.7, fig.align='center', out.extra='style="border-style: none"'} knitr::include_graphics(system.file("img", "bcXYExplain.png", package="lessR")) ``` The bar chart follows, with the aggregated data stored in the data frame named _a_, so explicitly identify with the `data` parameter. For only one variable analyzed, the computed mean of the _Salary_ variable in the _a_ data frame from the previous call to `pivot()` is named _mean_ by default. ```{r xy, fig.width=4, fig.height=3.5, fig.align='center'} Chart(Dept, y=Salary_mean, data=a) ``` ### Statistical Transformation of $y$ As seen, by default in the absence of other information, `Chart()` defines the numerical variable plotted as the count the occurrence of each level. You can define other statistical transformations of the numerical value of $y$ with the `stat` parameter. Possible values of `stat`: `"sum"`, `"mean"`, `"sd"`, `"dev"`, `"min"`, `"median"`, and `"max"`. The `"dev"` value displays the mean deviations to further facilitate a comparison among levels. Here the $x$-variable is Dept, and $y$-variable is _Salary_. Display bars for values of `dev` <= 0 in a different color than values above with the `fill_split` parameter set at `0`. Do an ascending sort with the `sort` parameter set at `"+"`. ```{r, fig.width=4, fig.height=3.5, fig.align='center'} Chart(Dept, y=Salary, stat="dev", sort="+", fill_split=0) ``` Compare this visualization of the mean deviations with the previous visualization of the means for each Dept. Or, continuously blend the colors with a divergent scale. ```{r, fig.width=4, fig.height=3.5, fig.align='center'} Chart(Dept, y=Salary, stat="deviation", sort="+", fill_scaled=TRUE, fill=c("red", "blue")) ``` ### Stacked Bar or Bubble Charts Following are stacked bar charts for 20 6-pt Likert scale items. The default scale is divergent from "browns" to "blues". ```{r barstack, fig.width=6, fig.height=5.5, fig.align='center'} d <- Read("Mach4", quiet=TRUE) Chart(m01:m20, horiz=TRUE, labels="off", sort="+") ``` Following are stacked bubble charts for 20 6-pt Likert scale items, also referred to here as a \emph{bubble plot frequency matrix} (BPFM). ```{r BPFM, fig.width=6, fig.height=10.5, fig.align='center'} Chart(m01:m20, type="bubble") ``` ## Chart of Two Categorical Variables ### Standard Bar Charts Specify the second categorical variable with the `by` parameter. Usually, for clarity, specify the `by` parameter by name. The general syntax follows. ```{r bc2var, dataTable, echo=FALSE, out.width='34%', fig.asp=.7, fig.align='center', out.extra='style="border-style: none"'} knitr::include_graphics(system.file("img", "bcXYExplain.png", package="lessR")) ``` ```{r, echo=FALSE, include=FALSE} d <- Read("Employee") ``` The example plots Dept with the percentage of _Gender_ divided in each bar. ```{r, fig.width=4, fig.align='center'} Chart(Dept, by=Gender) ``` ### Sunburst Chart The sunburst start is a hierarchical pie chart, that is, a pie chart with additional levels for each by variable. ```{r, echo=FALSE, include=FALSE} d <- Read("Employee") ``` ```{r sun1, eval=FALSE} Chart(Dept, by=Gender, type="pie") ``` ```{r sun1run, echo=FALSE, fig.align='center', fig.width=3.5, fig.height=3.5} p <- Chart(Dept, by=Gender, type="pie") p ``` Specify additional rings for the sun burst chart by specifying multiple `by` variables. ```{r sun1by, eval=FALSE} Chart(Dept, by=c(Gender, Plan), type="pie") ``` ```{r sun1runby, echo=FALSE, fig.align='center', fig.width=3.5, fig.height=3.5} p <- Chart(Dept, by=c(Gender, Plan), type="pie") p ``` ### Other Hierarchical Charts Some charts provide the perspective of the whole divided into its component pieces. The previously presented pie chart is one such example. These charts are called hierarchical charts or part-whole visualizations. Other possibilities are the treemap and icicle charts. ```{r egTM2, eval=FALSE, fig.cap="Treemap of count of the number of employees in each department."} Chart(Dept, by=Gender, type="treemap") ``` ```{r egTMrun2, echo=FALSE, fig.cap="Treemap of count of the number of employees in each department."} p <- Chart(Dept, by=Gender, type="treemap") p ``` ```{r egIce2, eval=FALSE, fig.cap="Icicle chart of count of the number of employees in each department."} Chart(Dept, by=Gender, type="icicle") ``` ```{r egIceRun2, echo=FALSE, fig.cap="Icicle chart of count of the number of employees in each department."} p <- Chart(Dept, by=Gender, type="icicle") p ``` ### Other Charts ```{r radarExby, echo=TRUE} p <- Chart(Dept, by=Gender, type="radar") p ``` ```{r bubExby, echo=TRUE, fig.height=3} p <- Chart(Dept, by=Gender, type="bubble") p ``` ```{r dotEx, echo=TRUE, fig.height=3} p <- Chart(Dept, type="dot") p ``` ```{r dot2Exby, echo=TRUE, fig.height=3} p <- Chart(Dept, y=c(Pre, Post), stat="mean", origin_x=70, type="dot") p ``` ### Trellis or Facet Plots Can also do a Trellis chart with the `facet` parameter. ```{r, fig.width=4, fig.align='center'} Chart(Dept, facet=Gender) ``` Or, stack the charts vertically by specifying one column with the `n_col` parameter. ```{r, fig.align='center'} Chart(Dept, facet=Gender, n_col=1) ``` ### 100% Stacked Bar Chart Obtain the 100% stacked version with the `stack100` parameter. This visualization is most useful for comparing levels of the `by` variable across levels of the `x` variable, here _Dept_, when the frequencies in each level of the `x` variable differ. The percentages across categories are compared instead of the counts. The percentage for each column, then, sums to 100%. ```{r, fig.width=4, fig.align='center'} Chart(Dept, by=Gender, stack100=TRUE) ``` ### Long Value Labels Long _value labels_ on the horizontal axis are also addressed by moving to a new line whenever a space is encountered in the label. Here read responses to the Mach IV Machiavellianism scale where each item is scored from 0 to 5. ```{r} d <- rd("Mach4", quiet=TRUE) ``` Also, read _variable labels_ into the _l_ data frame, which are then used to automatically label the output, both the visualization and text output to the console. ```{r} l <- rd("Mach4_lbl", quiet=TRUE) ``` Convert the specified four Mach items to ordered factors with the __lessR__ function `factors()`. This function implements the base R function `factor()` across a range of variables instead of a single variable (without needing other function calls). A response of 0 is a Strongly Disagree, etc. ```{r} LikertCats <- c("Strongly Disagree", "Disagree", "Slightly Disagree", "Slightly Agree", "Agree", "Strongly Agree") d <- factors(c(m06,m07,m09,m10), levels=0:5, labels=LikertCats, ordered=TRUE) ``` Because the factors are defined as ordered with the `factors()` function, the colors are plotted in a sequential scale, from light to dark. Because output to the console has been turned off in general, turn back on just for this analysis because of new data. ```{r fig.width=6, fig.height=4.5, fig.align='center'} Chart(m06, by=m07, quiet=FALSE) ``` If the categorical variable is not a factor, use a parameter `fill` plural color such as `"blues"`, `"reds"`, or `"emaralds"` to assign a gradient. See the Customize vignette for more details on color palettes. ## Customization ```{r, echo=FALSE, include=FALSE} d <- Read("Employee") ``` ### One Categorical Variable #### Custom Colors Specify a single fill color with the `fill` parameter, the edge color of the bars with `color`. Set the transparency level with `transparency`. Against a lighter background, display the value for each bar with a darker color using the `labels_color` parameter. To specify a color, use color names, specify a color with either its `rgb()` or `hcl()` color space coordinates, or use the __lessR__ custom color palette function `getColors()`. ```{r fig.width=4, fig.height=3.75, fig.align='center'} Chart(Dept, fill="darkred", color="black", transparency=.8, labels_color="black") ``` Use the `theme` parameter to change the entire color theme: "colors", "lightbronze", "dodgerblue", "slatered", "darkred", "gray", "gold", "darkgreen", "blue", "red", "rose", "green", "purple", "sienna", "brown", "orange", "white", and "light". In this example, changing the full theme accomplishes the same as changing the fill color. Turn off the displayed value on each bar with the parameter `labels` set to `off`. Specify a horizontal bar chart with base R parameter `horiz`. ```{r fig.width=4, fig.height=3.5, fig.align='center'} Chart(Dept, theme="gray", labels="off", horiz=TRUE) ``` Or, you can use `style()` to change the theme for subsequent visualizations as well. See the `Customize` vignette. Dept is not an ordinal variable (i.e., with ordered values set by the base R `factor()` function). Ordinal variables plot by default with a range of the same hue from light to dark. To illustrate, you can choose many different sequential palettes from `getColors()`: "reds", "rusts", "browns", "olives", "greens", "emeralds", "turquoises", "aquas", "blues", "purples", "violets", "magentas", and "grays". ```{r fig.width=4, fig.height=3.5, fig.align='center'} Chart(Dept, fill="reds") ``` The color-blind family of viridis palettes are also available: "viridis", "cividis", "magma", "inferno", "plasma". The bar graph below indicates the primary viridis palette. ```{r fig.width=4, fig.height=3.5, fig.align='center'} Chart(Dept, fill="viridis") ``` For something different, many Wes Anderson movie themes are available: "BottleRocket1", "BottleRocket2", "Rushmore1", "Rushmore", "Royal1", "Royal2", "Zissou1", "Darjeeling1", "Darjeeling2", "Chevalier1", "FantasticFox1", "Moonrise1", "Moonrise2", "Moonrise3", "Cavalcanti1", "GrandBudapest1", "GrandBudapest2", "IsleofDogs1", "IsleofDogs2". ```{r fig.width=4, fig.height=3.5, fig.align='center'} Chart(Dept, fill="GrandBudapest1") ``` Instead of arbitrarily setting the value of the interior color of the bars with the `fill` parameter, map the value of the tabulated count to the bar `fill`. With mapping, the color of the bars depends upon the bar height. The higher the bar, the darker the color. Specify `(count)` as the fill color to map the values of the numerical variable to the fill color. ```{r fig.width=4, fig.height=3.5, fig.align='center'} Chart(Dept, fill=(count)) ``` #### Axis Labels ##### Rotate Labels Rotate and offset the axis labels with `rotate_x` and `offset` parameters. Do a descending sort of the categories by frequencies with the `sort` parameter. ```{r fig.width=4, fig.height=3.5, fig.align='center'} Chart(Dept, rotate_x=45, offset=1, sort="-") ``` ##### Format Labels The default for formatting both axis labels is to round numeric values of thousands, such as 100000 to 100K. With parameter `axis_fmt`, this default of to `{"K"}` can be changed. Also you can specify `{","}` to insert commas in large numbers with a decimal point or `{"."}` to insert periods, or `{""}` to turn off formatting. The value of `{"K"}` can also be combined with `{","}` or `{"."}` by forming a vector of values, such as `c("K", ",")`. Axis labels can also be formatted by adding a prefix to a numeric value with the parameter `axis_pre`, such as `$` or `€`. The value of `axis_pre` can be multiple characters, such as for the Brazilian currency, `R$`. ```{r fig.width=4, fig.height=3.5, fig.align='center'} Chart(Dept, axis_fmt=",", axis_y_pre="£") ``` ### Two Categorical Variables The stacked version is default, but the values of the second categorical variable can also be represented with bars, more helpful to compare the values with each other. Here, put the legend on the top with the `labels_position` parameter set to `"out"`. ```{r, fig.width=5, fig.align='center'} Chart(Dept, by=Gender, beside=TRUE, labels_position="out") ``` Or, display the bars horizontally with the `horiz` parameter set to `TRUE`. ```{r, fig.width=6} Chart(Gender, by=Dept, horiz=TRUE) ``` Specify two custom fill colors for _Gender_. ```{r, fig.width=4, fig.align='center'} Chart(Dept, by=Gender, fill=c("deepskyblue", "black")) ``` ### Annotation Annotate a plot with the `add` parameter. To add a rectangle use the `"rect"` value of `add`. Here set the rectangle around the message centered at <3,10>. To specify a rectangle requires two corners of the rectangle, `` and ``. To specify text requires just a single coordinate, ``. With the `add` parameter, the message follows the specification of `"rect"`, so the coordinates of the text message follow the coordinates for the rectangle. First lighten the fill color of the annotation with the `add_fill` parameter for the `style()` function. ```{r, fig.width=4, fig.height=3.5, fig.align='center'} d <- Read("Employee", quiet=TRUE) style(add_fill="aliceblue") Chart(Dept, add=c("rect", "Employees by\nDepartment"), x1=c(1.75,3), y1=c(11, 10), x2=4.25, y2=9) ``` ### Variable Labels As an option, also read the table of variable labels. Create the table formatted as two columns. The first column is the variable name and the second column is the corresponding variable label. Not all variables need be entered into the table. The table can be a `csv` file or an Excel file. Read the label file into the _l_ data frame, currently the only permitted name. The labels will be displayed on both the text and visualization output. Each displayed label is the variable name juxtaposed with the corresponding label, as shown in the following output. ```{r labels} l <- rd("Employee_lbl") l ``` ### Filter Rows Many tools exist in the standard R environment and with functions from additional packages to manipulate the data within data frames. An example is sub-setting the data by rows. For example, create a subset data frame of the employee data set that only consists of women. Or, perform the filtering with __lessR__ functions such as `Chart()`. To do so, invoke the parameter `filter` and either specify a logical condition such as `Gender=="W"` or specify an integer vector that corresponds to the wrong numbers of the data frame you wish to retain. ```{r f1} Chart(Dept, filter=(Gender=="W")) ``` ```{r f1hr2} Chart(Dept, filter=c(1:5, 20,21)) ``` ## Interactive Bar Chart An interactive visualization lets the user in real time change parameter values to change characteristics of the visualization. To create an interactive bar chart that displays the corresponding parameters, run the function `interact()` with the value `"BarChart"` specified. ``` interact("BarChart") ``` The function is not run here because interactivity requires to run directly from the R console. ## Full Manual Use the base R `help()` function to view the full manual for `Chart()`. Simply enter a question mark followed by the name of the function. ``` ?BarChart ``` ## More More on Bar Charts and other visualizations from __lessR__ and other packages such as __ggplot2__ at: Gerbing, D., _R Visualizations: Derive Meaning from Data_, CRC Press, May, 2020, ISBN 978-1138599635.