--- title: "Partial table data download" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Partial table data download} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", cache = FALSE, eval = nzchar(Sys.getenv("COMPILE_VIG")) ) ``` Sometimes one is only interested in a relatively small subset of a large table, and instead of having to download the entire table, or just a handful of vectors, one might want to access a substantial number of coordinates without having to download the entire table. The package facilitates this by * allowing to create a template table using just the table metadata that informs on the available combination of dimensions and their coordinates, and * allowing to access a subset of the table using a filtered version of the template. We demonstrate this process by taking the large Table 34-10-0285 on building permits. ```{r setup} library(cansim) library(dplyr) library(ggplot2) bp_template <- get_cansim_table_template("34-10-0285") head(bp_template) ``` The template gives information on the possible combinations of data (although some combinations might not be available in the table and have no data). We can use this to pinpoint data series we are interested in. For the purpose of this example we want to look at the number of housing units created and lost in the four largest metro areas in Canada. ```{r} bp_template_filtered <- bp_template %>% filter(Geography %in% c("Toronto, Ontario", "Montréal, Quebec", "Vancouver, British Columbia", "Calgary, Alberta"), `Type of building` %in% c("Total residential","Total demolitions"), `Type of work` %in% c("Demolitions for residential dwellings","Deconversion total","Conversions total","New dwelling units total"), Variables %in% c("Number of dwelling-units created", "Number of dwelling-units lost", "Number of dwelling-units demolished"), `Seasonal adjustment, value type` == "Unadjusted, current" ) bp_template_filtered ``` This leaves us with `r if (exists("bp_template_filtered")) nrow(bp_template_filtered) else 96` coordinates from the table. To get data for just this subset of the table we feed that into the `get_cansim_data_for_table_coord_periods` function. By default this will get the data for all available times, we could narrow it down to the latest $n$ periods using the `periods` argument, but here we want to get all available data. ```{r} bp_data <- get_cansim_data_for_table_coord_periods(bp_template_filtered) bp_data ``` We then aggregate it to annual data and plot it by year and area, separating out dwellings newly created, created through conversion, those lost to deconversions and those demolished. ```{r fig.alt="Vignette example plot, building and demolition permits"} bp_data |> mutate(Value=case_when( # count demolitions and deconversions as negative Variables %in% c("Number of dwelling-units demolished","Number of dwelling-units lost") ~ - val_norm, TRUE ~ val_norm )) |> mutate(Name=gsub(", .+","",GEO), Year=strftime(Date,"%Y")) |> summarize(Value=sum(Value),n=n(),.by=c(Name,Year,`Type of work`)) |> filter(n==12,!is.na(Value)) |> # only show years with complete 12 months of data ggplot(aes(x=Year,y=Value,fill=`Type of work`)) + geom_bar(stat="identity") + facet_wrap(~Name,scales="free_y") + scale_y_continuous(labels=scales::comma) + theme(axis.text.x = element_text(angle=90, hjust=1)) + labs(title="Building permits for residential structures in Canadian metro areas", y="Number of dwelling units", x=NULL, fill="Metric", caption="StatCan Table 34-10-0285") ``` ## Using vectors instead of coordinates The same can be achieved with downloading data by vectors. For this we need to add the vector information to the table template. Vector information is only available for coordinates, so it also gives an effective way to filter out invalid coordinate combinations from the template. Vector information is not available for census data tables. ```{r} bp_template_filtered_vecotrs <- bp_template_filtered |> add_cansim_vectors_to_template() bp_data_vector <- bp_template_filtered_vecotrs$VECTOR |> na.omit() |> get_cansim_vector() ``` This gives us the same data as above, with possibly shorter time series for some coordinates as querying data by vector will not pull data for times when the specific vector was not available. When accessed by vector or by coordinate the data will only differ in this limited way, and the values on the difference will be `NA` and won't affect the results. For completeness we plot the vector data and obtain an identical graph. ```{r fig.alt="Vignette example plot, building and demolition permits"} bp_data_vector |> mutate(Value=case_when( # count demolitions and deconversions as negative Variables %in% c("Number of dwelling-units demolished","Number of dwelling-units lost") ~ - val_norm, TRUE ~ val_norm )) |> mutate(Name=gsub(", .+","",GEO), Year=strftime(Date,"%Y")) |> summarize(Value=sum(Value),n=n(),.by=c(Name,Year,`Type of work`)) |> filter(n==12,!is.na(Value)) |> # only show years with complete 12 months of data ggplot(aes(x=Year,y=Value,fill=`Type of work`)) + geom_bar(stat="identity") + facet_wrap(~Name,scales="free_y") + scale_y_continuous(labels=scales::comma) + theme(axis.text.x = element_text(angle=90, hjust=1)) + labs(title="Building permits for residential structures in Canadian metro areas", y="Number of dwelling units", x=NULL, fill="Metric", caption="StatCan Table 34-10-0285") ```