--- title: "Selecting Community Detection Resolution Parameters with CHAMP Maps: the karate club network" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Selecting Community Detection Resolution Parameters with CHAMP Maps: the karate club network} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.height = 3, fig.width = 5 ) ``` ```{r setup} library(ideanet) ``` **This vignette demonstrates the framework of CHAMP and maps on the CHAMP set to find communities in the karate club network. If you haven't already done so, you will want to first look at the [CHAMP vignette](CHAMP.html) to better understand the commands used here.** These functions, as part of the broader IDEANet project, are supported by the National Science Foundation as part of the Human Networks and Data Science - Infrastructure program (BCS-2024271 and BCS-2140024). If you use these functions in your work, please cite 1. [the `ideanet` package](https://doi.org/10.32614/CRAN.package.ideanet), 2. [the Weir et al. (2017) CHAMP paper](https://doi.org/10.3390/a10030093), and 3. [the Gibson & Mucha (2022) CHAMP maps paper](https://doi.org/10.1038/s41598-022-20142-6). In addition to this vignette, we recommend you look at the [CHAMP_football vignette](CHAMP_football.html), which goes into an even deeper dive on the network of Division I-A college football games from the Fall 2000 season, demonstrating how this pipeline helps uncover the conference structure. ## Example: Zachary's Karate Club (unweighted) We now re-demonstrate the 3 main steps, this time using the karate club, available in `igraphdata`, converted to an unweighted graph, since it is one of the most popular, classic examples considered in the community detection literature. ```{r karate_prep, message = FALSE} data(karate, package="igraphdata") karate <- igraph::delete_edge_attr(karate, "weight") ``` ```{r karate_viz, eval = FALSE} igraph::plot.igraph(karate, main="Zachary's Karate Club") ``` ### Step 1: Collect Partitions ```{r karate_partitions} kc_partitions <- get_partitions(karate, n_runs = 500, seed = 3478) ``` ### Step 2: CHAMP ```{r karateCHAMP, warning = FALSE} kc_partitions <- CHAMP(karate, kc_partitions, plottitle = "Zachary's Karate Club (unweighted)") ``` Once again, the plot generated by `CHAMP` represents each of the unique partitions returned by `get_partitions` (in this example, 78 of them) by a line giving its values of $Q(\gamma)$, the modularity of the partition as a function of resolution parameter $\gamma$. `CHAMP` then identifies which of these lines ever appear on the upper envelope of $Q(\gamma)$, that is, which of the corresponding partitions are somewhere optimal in the space of $\gamma$ values. In this example, there are 12 somewhere-optimal partitions found. Again, `CHAMP` updates the `partitions` list (here `kc_partitions`) with the generated `CHAMPfigure` plot shown above and a `CHAMPsummary` that provides essential information about the 12 somewhere-optimal partitions found here. ```{r karate_CHAMPsummary, eval = FALSE} print(kc_partitions$CHAMPsummary) ``` ```{r karate_CHAMPsummary_kable, echo = FALSE} knitr::kable(kc_partitions$CHAMPsummary) ``` The partition that straddles the default $\gamma=1$ resolution parameter here, which also happens to have a large `gamma_range` and `segment_length` (the length of the corresponding line segment on the upper envelope of Q v. $\gamma$ in the figure) is the 4-community partition identified as `partition_num` 6 here. Having thus highlighted this partition, we can directly examine it more closely. ```{r partition6} kc_partitions$partitions[[6]] igraph::membership(kc_partitions$partitions[[6]]) ``` ```{r partition6viz, eval = FALSE} igraph::plot.igraph(karate,mark.groups = kc_partitions$partitions[[6]]) ``` ### Step 3: Compute the SBM-equivalence iterative map on the CHAMP set ```{r, warning = FALSE} kc_partitions <- get_CHAMP_map(karate, kc_partitions, plotlabel = "Zachary's Karate Club (unweighted)") ``` The above figure re-visualizes each of the 12 partitions in the CHAMP set (that is, the somewhere-optimal partitions) as a line segment indicating the number of communities in the partition (on the vertical axis) versus its domain of optimality in $\gamma$ (on the horizontal axis). The arrows in the figure visualize the iterative map on the CHAMP set, directed from the middle of each line segment to the "estimated $\gamma$" associated with that partition and, thus, the partition that is optimal at that $\gamma$. These values are tabulated in the now updated `kc_partitions$CHAMPsummary`. ```{r karate_CHAMPsummary2, eval = FALSE} print(kc_partitions$CHAMPsummary) ``` ```{r karate_CHAMPsummary_kable2, echo = FALSE} knitr::kable(kc_partitions$CHAMPsummary) ``` Looking at the above figure and table, we can see that the only partition that maps to itself is the 4-community `partition_num` 6, as was indicated by the "fixed point" message output from `get_CHAMP_map`. Moreover, we note that if we repeatedly follow arrows from any other partition we eventually reach this same partition. As the only fixed point of the map on the CHAMP set, `partition_num` 6 is of special interest to us here. And, like in the previous example, the fact that this fixed point also happens to be the same as the optimal partition at the default resolution parameter $\gamma=1$ is a special feature of this simple example network (we will see in the final example below a situation where this default is not a fixed point). Importantly, this 4-community partition obtained as a fixed point of the iterative map on the CHAMP set is not the same as the result in Newman (2016), which is a 2-community partition because Newman's approach takes the number of communities as a specified input, and he understandably selects 2 communities for this example network because of its story. We stress that there is no single way to cluster data that is best in all cases, and that it is important to consider different possibilities. If one is only interested in solutions with 2 communities, then a modification of the current approach would be to only consider algorithms that are specified to give 2 communities, or to only keep the 2-community solutions that come out of `get_partitions`. For further discussion of these details, see Gibson & Mucha (2022). ### Weighted graphs? We could perform the three steps of this framework for the weighted version of the karate club. Indeed, it is interesting to do so to emphasize that weights matter and to highlight an important warning message for the third step of the framework (`get_CHAMP_map`). ```{r karate_viz2, eval = FALSE} data(karate,package="igraphdata") igraph::plot.igraph(karate, main="Zachary's Karate Club (weighted)") kcw_partitions <- get_partitions(karate, n_runs = 500, seed = 3478) kcw_partitions <- CHAMP(karate,kcw_partitions, plottitle="Zachary's Karate Club (weighted)") kcw_partitions <- get_CHAMP_map(karate,kcw_partitions, plotlabel="Zachary's Karate Club (weighted)") ``` If you run the above code, you will receive a warning > Warning: The theory underlying get_CHAMP_map() is for unweighted networks. The formulae have been naturally generalized to weighted networks, but these have not been well studied. `CHAMP` is fully valid for both weighted and unweighted networks. But the theoretical equivalence uncovered by Newman (2016) to define the iterative map used in `get_CHAMP_map` was developed for unweighted networks. While the resulting formulae that define the iterative map have a very natural generalization to weighted networks, this generalization has not to our knowledge been well studied, neither theoretically nor in practice. Nevertheless, we expect that it may still be useful in many situations for highlighting different partitions. In particular, just as the original definition of modularity for unweighted networks has a natural generalization to weighted edges through thinking about discretized multiedges, the iterative map might possibly have a reasonable multiedge interpretation. At a minimum, the dimensionless formulae for the "in" and "out" propensities and for the estimated $\gamma$ value defined in the theory all have natural generalizations computed in terms of edge weights with strength in place of degree and total edge weights in place of edge counts. Nevertheless, we again stress that this has not to our knowledge been well studied, and users are appropriately warned. As an aside, we also note that the above code to process the weighted karate network finds two fixed points in the map that are self-consistent in the generalized formulae of the modularity-SBM equivalence (whereas the unweighted version only yielded one fixed point partition). ## References Girvan, M., and M. E. J. Newman. “Community Structure in Social and Biological Networks.” Proceedings of the National Academy of Sciences of the United States of America 99, no. 12 (June 11, 2002): 7821–26. . Gibson, Ryan A., and Peter J. Mucha. 2022. "Finite-State Parameter Space Maps for Pruning Partitions in Modularity-Based Community Detection." Scientific Reports 12 (1): 15928. . - Python implementation: Gibson, Ryan. 2020--2024. . Newman, M. E. J. “Equivalence between Modularity Optimization and Maximum Likelihood Methods for Community Detection.” Physical Review E 94, no. 5 (November 22, 2016): 052315. . Pamfil, A. Roxana., Sam D. Howison, Renaud. Lambiotte, and Mason A. Porter. “Relating Modularity Maximization and Stochastic Block Models in Multilayer Networks.” SIAM Journal on Mathematics of Data Science, January 1, 2019, 667–98. . Peel, Leto, Daniel B. Larremore, and Aaron Clauset. 2017. "The Ground Truth about Metadata and Community Detection in Networks." Science Advances 3 (5): e1602548. . Weir, William H., Scott Emmons, Ryan Gibson, Dane Taylor, and Peter J. Mucha. 2017. "Post-Processing Partitions to Identify Domains of Modularity Optimization." Algorithms 10 (3): 93. . - Python implementation: Weir, William. 2017--2018. .