---
title: "setweaver"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{setweaver}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/vignette_",
out.width = "100%"
)
```
*setweaver* is an R package designed to help users create sets of variables based on a mutual information approach and explore how they are related to a specific outcome. In this context, a set is a collection of distinct elements (e.g., variables) that can also be treated as a single entity. Mutual information, a concept from probability theory, quantifies the dependence between two variables by expressing how much information about one variable can be gained from observing the other.
## Authors
[Aaron Fisher](https://psychology.berkeley.edu/people/aaron-fisher)\
[Nicolas Leenaerts](https://nicolasleenaerts.github.io/)
## Installation
You can install the released version of *setweaver* from [CRAN](https://CRAN.R-project.org) with:
```r
install.packages("setweaver")
```
Or you can install the development version of *setweaver* from GitHub with the following code snippet:
``` r
devtools::install_github('nicolasleenaerts/setweaver')
```
You can then attach the package as follows:
```{r setup}
library(setweaver)
```
## Pairing variables
You can create sets of variables using the *pairmi* function, which takes a dataframe of variables and pairs them up to a specified maximum number of elements. For each set, the mutual information between the variables is computed, followed by the calculation of a G-statistic. This statistic is then evaluated for significance based on a chi-squared distribution with a predefined alpha level. Alternatively, users can specify a mutual information threshold to determine the significance of the sets.
```{r example_1, results='hide',message=FALSE}
# Loading the package, which automatically also downloads the example data (misimdata)
library(setweaver)
# Pairing variables
results = pairmi(misimdata[,2:11],alpha = 0.05,n_elements = 5)
```
```{r table_1,echo=FALSE,results='asis'}
knitr::kable(results$expanded.data[c(1:5),],caption = 'Table 1. Expanded Data',align = c('c'))
```
```{r table_2,echo=FALSE,results='asis'}
knitr::kable(results$sets,caption = 'Table 2. Information on sets',align = c('c'))
```
## Evaluating sets
Once the sets are created with the *pairmi* function , you can assess their relationship with a specific outcome using the *probstat* function. This function employs k-fold cross-validation to compute parameters such as conditional probability, conditional entropy, and the odds ratio of the outcome given a particular set. Additionally, a Fisher's exact test or a generalized linear mixed model (i.e., for multilevel data) is performed to determine whether the outcome is significantly more likely to occur in the presence of a given set of variables.
```{r example_2, results='hide',message=FALSE}
# Evaluating the sets
evaluated_sets = probstat(misimdata$y,results$expanded.data[,results$sets$set],nfolds = 5)
```
```{r table_3,echo=FALSE,results='asis'}
knitr::kable(evaluated_sets[c(1:5),],caption = 'Table 3. Evaluated sets',align = c('c'))
```
## Visualizing sets
You can visualize the sets created with the *pairmi* function using the *setmapmi* function. This function generates a setmap, which illustrates the composition of sets by showing which original variables are included in sets of a given size.
```{r example_3, fig.align = "center", fig.height = 6, fig.width =8, fig.cap="Plot 1. Setmap of sets that consist of 2 elements"}
# Visualizing the sets
setmapmi(results$original.variables,results$sets,n_elements = 2)
```
## Visualizing relations between sets and an outcome
You can also visualise how sets are related to an outcome with the *plot_prob* function. Here, the relationships can displayed either as conditional probabilities or as effects estimated by logistic regression.
```{r example_4, fig.align = "center", fig.height = 6, fig.width = 6, fig.cap="Plot 2. Graph showing the relation between certain sets and an outcome y"}
# Creating a graph where sets are relate to an outcome using logistic regression effects
plot_prob(cbind(y=misimdata[,1],results$expanded.data[,13:17]),
'y',colnames(results$expanded.data[,13:17]),method='logistic')
```
## Working Directly with Underlying Functions
If you wish to explore the relationships between variables using a probabilistic or mutual information framework, you can call the lower-level functions from the *pairmi* and *probstat* functions directly. This allows for detailed and customized analyses. For example, the *entfuns* function calculates several descriptive measures that summarize the relationships between predictor variables and an outcome variable.
```{r example_5, results='hide',message=FALSE}
# Compute entropy and mutual information diagnostics for selected variables
descriptives = entfuns(misimdata$y,misimdata[,2:3])
```
```{r table_4,echo=FALSE,results='asis'}
knitr::kable(entfuns(misimdata$y,misimdata[,2:3]),caption = 'Table 4. Diagnostic statistics from entfuns()',align = c('c'))
```
Enjoy using the package, and reach out if you have any questions!