Multiple comparisons with ‘ggpmisc’

Letter labels and labelled bars

Plotting examples
Author

Pedro J. Aphalo

Published

2023-08-14

Modified

2024-04-15

Abstract

Examples of plots with pairwise labels and letter labels created with stat_multcomp() from R package ‘ggpmisc’. Function stat_multcomp() computes and adds to ggplots the results from multiple comparison tests as labels. Multiple comparisons based on Tukey and Dunnet contrasts are demonstrated.

Keywords

ggplot2 pkg, ggpp pkg, ggpmisc pkg, data labels, plot annotations, pairwise

Warning

The examples in this page make use of R packages ‘ggpmisc’ (>= 0.5.5.9000) and ‘ggpp’ (>= 0.5.5) not yet available through CRAN. In ‘ggpmisc’ 0.5.4 and 0.5.5 stat_multcomp() was a preliminary implementation that will change in future version 0.5.6. Versions before 0.5.4 lack the layer functions described in this page.

Packages ‘ggpp’ and ‘ggpmisc’ extend the plotting commands from package ‘ggplot2’.

Tip

In this page most code chunks are “folded” so as to decrease the clutter when searching for examples. A few code chunks that are reused across several plots are by default unfolded to make them more visible. Above each plot you will find one or more “folded” code chunks signalled by a small triangle followed by “Code”. Clicking on the triangle “unfolds” the code chunk making visible the R code used to produce the plot.

The code in the chunks can be copied by clicking on the top right corner, where an icon appears when the mouse cursor hovers over the code listing.

The </> Code drop down menu to the right of the page title makes it possible to unfold all code chunks and to view the Quarto source of the whole web page.

Names of functions and other R objects are linked to the corresponding on-line help pages. The names of R extension packages are linked to their documentation web sites when available.

1 Plot annotations for multiple comparison tests

Data labels add textual information directly related to individual data points (shown as glyphs), such as original observations or summaries such as means. The position of labels in this case is dependent on the scales used to represent data points. Text is usually displaced so that it does not occlude the glyph representing the data point and when the link to the data point is unclear, this link is signalled with a line segment or arrow. Data labels are distinct from annotations in that they contribute directly to the representation of data on a plot or map.

Annotations differ from data labels, in that their position is decoupled from their meaning. Insets can be thought as larger, but still self-contained annotations. In most cases the reading of inset tables and plots depends only weakly on the plot or map in which they are included.

In the case of annotations and insets the designer of a data visualization has the freedom to locate them anywhere, as long as they do not occlude features used to describe data.

The letter labels and labelled segments used to highlight pairwise comparisons are a special case as they behave as data labels along the axis onto which an explanatory factor has been mapped, usually x, but frequently as annotations along the axis onto which a continuous numeric variable has been mapped, usually y.

2 Multiple comparisons in ggplots

Caution

In experimental designs with more than two groups or treatments, we face the problem that testing every possible pair for significant difference, includes a test of the smallest vs. the largest observed value. This means that the probability of at least one pair being significant is no longer the one computed in a t-test or LSD (least significant difference). We need to distinguish the probability of making the wrong decision in an individual comparison (comparison-wise) from that for the experiment as a whole (experiment-wise, the number of wrong decisions per experiment).

There are different approaches, that can be grouped into methods that attempt control the experiment-wise probability of false positive outcomes to a given P-level, such as 0.05 (adjusted P-values), and those that attempt to control the false discovery rate (FDR), that can be thought of as the proportion of the positive outcomes that are false positives. There are variations in both approaches, varying in power and conservativeness.

A correction needs to be applied to P-values or LSD. Different methods are available for this. Methods differ in their assumptions and difficulty of computation. Tukey’s HSD (honestly significant difference) and the Bonferroni correction are well known approaches. Bonferroni’s method is a “quick and dirty” calculation but nowadays not recommended as it applies an unnecessarily strong correction. Dunnet contrasts test all treatments individually against a control condition. Tukey contrasts targets the case of testing all possible pairs. It is also possible to set-up contrasts of interest individually, or testing for other differences, such as basing them on the average of related treatments.

In general, one should test for significant differences only those contrasts selected a priori as of interest. The fewer the pairs tested for significance, the smaller a correction needs to be applied, thus increasing the power of the test for the cases of interest. Of course, for validity, which tests are of interest should be decided independently of the data. Avoid cherry picking!

Multiple comparisons are frequently applied as post-hoc tests (after a different test of significance with a broader scope). As post-hoc tests, they are seen as a way of investigating the source of an overall significant difference, and multiple comparisons are applied only if the main effect of the factor is in itself significant and the additional contrasts skipped otherwise. In other cases multiple comparisons can be the primary statistical test addressing the research hypotheses. Both approaches are valid as long as the hypotheses have been set independently of the data being used in the test. I repeat: what should be avoided is to cherry pick promising pairwise tests based on the data, and then correct the P-values taking into account only the cherry-picked pairs.

Package ggsignif seems to be the most popular (only?) implementation of pairwise comparisons for ggplots. I have used this package and it does a a good job for automating some simple tests, but it has limitations. Recently I wished to have more flexible formatting of labels using plotmath expressions as well as the ability to adjust P-values using multiple comparison statistical procedures. Presenting adjusted P-values is of crucial importance and in my view should be the default behaviour. There is an issue at GitHub and questions in StackOverflow wishing this to be supported.

Package ggsignif, although maintained, is no longer under active development. The authors suggest creating pull requests for new features. I studied the code and it seems to me that the coupling between the statistic and geometry is unnecessarily tight and would make it difficult to create a pull request that fulfils the needs stated above.

In particular, the manual mode of stat_signif() seems like an afterthought modification to the statistic, and does not follow the expectations of the grammar of graphics that statistics do computations and geometries create a graphical representation. So, I decided to implement from scratch enhanced functionality in packages ggpp and ggpmisc to complement that in package ggsignif.

Package ggpp mainly defines geometries and scales that can be useful for the development of other packages. Package ggpmisc implements mostly statistics and also imports and re-exports package ggpp.

Package ggsignif, defines geom_signif() and stat_signif(), implementing pairwise t-tests and other pairwise tests. In this statistic, tests are done individually, not as multiple comparisons. In some cases this is o.k., but in most cases it is the wrong approach to testing contrasts that are not orthogonal. R package ‘multcomp’ is well established and implements flexibly several different methods, including modern ones, to adjust P-values from multiple comparisons.

3 Package ‘ggpp’ (>= 0.5.4)

I added two new geometries,geom_text_pairwise() and geom_label_pairwise() to package ggpp. Writing the code of geom_text_pairwise() and geom_label_pairwise() was easy, using the existing code of geom_text_s() and geom_label_s() as a base. The user interface is, thus, consistent with that of other related layer functions in package ggpp. These geometries are fully functional on their own and by default make use of ggplot2::stat_identity(). Examples of plots with R code making use of them are available in page Pairwise labels with ‘ggpp’.

These geometries are convenience layer functions when used on their own as the same plots can be created by adding separately the segment and text or label layers to a plot. However, they become a necessity, when designing a statistic.

Caution

Functions geom_text_pairwise() and geom_label_pairwise() are still under development and have some limitations. An important one is that the segment of bar labels created with geom_label_pairwise() extends under the label box, thus making the use of alpha transparency impossible. In addition, the behaviour of the linewidth aesthetic with geom_label_pairwise() can be surprising.

Some features of the user interface of geom_text_pairwise() and geom_label_pairwise() may change in future versions of the package.

4 Package ‘ggpmisc’ (>= 0.5.4)

I have added to package ggpmisc (>= 0.5.4) a new statistic, stat_multcomp(), that makes some of the features from function glht() from package multcomp easy to use to add a plot layer highlighting the outcome of these tests. In ‘ggpmisc’ (>= 0.5.6) additional features are supported.

Layer function stat_multcomp() first fits a model (a linear model by default) followed by a multiple comparisons test with a user-controlled adjustment to the P-values. By default "Tukey" contrasts are computed but "Dunnet" contrasts are also supported. The implementation makes use of function glht() from package multcomp making available all the methods it supports for the adjustment of P-values for multiple comparisons. Currently, stat_multcomp() only implements pairwise contrasts, as in this case there are well established approaches to plot annotations. By passing a numeric matrix as argument arbitrary sets of pairwise contrasts can be specified. Contrasts that involve more than two levels of a factor are not yet supported.

The returned data contains both numeric and ready formatted character strings. The returned value and default geometry depend on the type of label, that can be either bars (connecting segments) labelled with P-values or other parameters, or letters. With bars, plots get crowded easily, and are supported for factors with two to five levels if using “Tukey” contrasts. In contrast, encoding pairwise difference from “Tukey” contrasts as letters is possible with factors with two or more levels. Letter encoding is not implemented for “Dunnet” contrasts as such encoding is unsuitable in this case.

As other layer functions in R package ggpmisc , stat_multcomp() attempts to be flexible, both in the statistical methods used to test multiple comparisons and in the way the outcomes can be shown in plots. As this is a safe (= conservative) approach, the adjustment of P-values is enabled by default.

Even if flexible, stat_multcomp() does not cater for all uses of multiple comparisons, and it is to be expected that in some cases multiple comparison tests will be applied before plotting, and the outcomes shown in plots using geometries rather than statistics. Examples of such use of geometries geom_text_pairwise() and geom_label_pairwise() are available in the page Pairwise labels with ‘ggpp’.

Caution

Function stat_multcomp() is still under development and has some limitations. Currently, only “Tukey” and “Dunnet” contrasts are supported and the order of letters in letter labels is fixed. This statistic has been tested only with method = "lm", method = "aov" and method = "rlm"although several other model fit functions can be expected also to work.

Some features of the user interface of stat_multcomp() may change in future versions of the package.

5 Alternatives

R package ‘ggsignif’ implements a similar graphical approach to annotating pairwise significance tests. However, it uses a different user interface, that deviates from my own expectations based on the Grammar of Graphics. Another difference is that it does not easily support methods specific to multiple comparisons or adjsuted p-values. The layer function geom_signif() can be made to work with dodged columns and boxplots, while the layer functions from packages ‘ggpp’ and ‘ggpmisc’ currently do not support this use case.

6 Plot examples with code

Package ggpmisc imports and re-exports all definitions from ggpp as well as from ggplot2, so it is enough to attach explicitly package ggpmisc . All three packages are available through CRAN.

library(ggpmisc)
theme_set(theme_bw())

6.1 Using default labels

Tukey’s method for all pairwise contrasts using honestly significant differences.

Code
ggplot(mpg, aes(factor(cyl), cty)) +
  stat_summary(fun.data = mean_cl_normal, colour = "red") +
  stat_multcomp(label.y = 12, 
                size = 2.75, 
                vstep = 0.05) +
  expand_limits(y = 0)

Using Holm’s method to adjust P-values.

Code
ggplot(mpg, aes(factor(cyl), cty)) +
  stat_summary(fun.data = mean_cl_normal, colour = "red") +
  stat_multcomp(p.adjust.method = "holm",
                label.y = 12, 
                size = 2.75, 
                vstep = 0.05) +
  expand_limits(y = 0)

Dunnet’s method for comparison of each treatment against a control, assumed to be the first level of the factor.

Code
ggplot(mpg, aes(factor(cyl), cty)) +
  stat_summary(fun = mean, geom = "col", width = 0.5) +
  stat_summary(fun.data = mean_cl_normal, colour = "red") +
  stat_multcomp(label.y = 25, 
                size = 2.75,
                contrasts = "Dunnet")

A staircase of pairwise contrasts with P-values adjusted using Holm’s method.

Code
ggplot(mpg, aes(factor(cyl), cty)) +
  stat_summary(fun = mean, geom = "col", width = 0.5) +
  stat_summary(fun.data = mean_cl_normal, colour = "red") +
  stat_multcomp(label.y = c(18, 23, 24),
                size = 2.75,
                contrasts = rbind(c(0, 0, -1, 1),
                                  c(0, -1, 1, 0),
                                  c(-1, 1, 0, 0)))

Tukey’s pairwise contrasts using defaults except for the position of the lowermost bar.

Code
ggplot(mpg, aes(factor(cyl), cty)) +
  stat_summary(fun = mean, geom = "col", width = 0.5) +
  stat_summary(fun.data = mean_cl_normal, colour = "red") +
  stat_multcomp(size = 2.5, label.y = 25)

Tukey’s pairwise contrasts using defaults and shown with letters.

Code
ggplot(mpg, aes(factor(cyl), cty)) +
  stat_summary(fun = mean, geom = "col", width = 0.5) +
  stat_summary(fun.data = mean_cl_normal, colour = "red") +
  stat_multcomp(label.y = -1, 
                label.type = "letters")

Code
ggplot(mpg, aes(factor(cyl), cty)) +
  stat_summary(fun = mean, geom = "col", width = 0.5) +
  stat_summary(fun.data = mean_cl_normal, colour = "red") +
  stat_multcomp(colour = "white", 
                label.type = "letters",
                adj.method.tag = 0)

Code
ggplot(mpg, aes(factor(cyl), cty)) +
  stat_multcomp(size = 2.75) +
  stat_boxplot(width = 1/3)

Code
ggplot(mpg, aes(factor(cyl), cty)) +
  stat_multcomp(size = 2.85,
                label.y = 11,
                geom = "text_pairwise",
                vstep = 0.07,
                p.digits = 2,
                contrasts = "Dunnet") +
  stat_boxplot(width = 1/3) +
  expand_limits(y = 0)

Code
ggplot(mpg, aes(factor(cyl), cty)) +
  stat_multcomp(label.type = "letters",
                geom = "label",
                size = 2.75) +
  stat_boxplot(width = 1/3)

Numeric P-values shown for Dunnet contrasts.

Code
ggplot(mpg, aes(factor(cyl), cty)) +
  stat_summary(fun = mean, geom = "col", width = 0.5) +
  stat_summary(fun.data = mean_cl_normal, colour = "red") +
  stat_multcomp(aes(x = stage(start = factor(cyl), 
                              after_stat = x.right.tip)),
                geom = "text",
                label.y = -1, 
                vstep = 0,
                size = 3,
                contrasts = "Dunnet")

6.2 Using pre-built labels other than default

Code
ggplot(mpg, aes(factor(cyl), cty)) +
  stat_multcomp(use_label(c("delta", "P")),
                size = 2.75,
                label.y = 11,
                vstep = 0.08,
                p.digits = 2,
                contrasts = "Dunnet") +
  stat_boxplot(width = 1/3) +
  expand_limits(y = 0)

Code
ggplot(mpg, aes(factor(cyl), cty)) +
  stat_multcomp(use_label(c("t", "P")),
                          size = 2.75,
                label.y = 11,
                vstep = 0.08,
                contrasts = "Dunnet") +
  stat_boxplot(width = 1/3) +
  expand_limits(y = 0)

Code
ggplot(mpg, aes(factor(cyl), cty)) +
  stat_summary(fun = mean, geom = "col", width = 0.5) +
  stat_summary(fun.data = mean_cl_normal, colour = "red") +
  stat_multcomp(aes(x = stage(start = factor(cyl), after_stat = x.right.tip),
                    label = after_stat(stars.label)),
                geom = "text",
                label.y = -1, 
                vstep = 0,
                contrasts = "Dunnet")

6.3 Using other aesthetics than the default

Code
ggplot(mpg, aes(factor(cyl), cty)) +
  stat_multcomp(aes(colour = after_stat(p.value) < 0.05),
                size = 2.75,
                label.y = 11,
                vstep = 0.08,
                p.digits = 2,
                contrasts = "Dunnet") +
  stat_boxplot(width = 1/3) +
  scale_colour_manual(values = c("grey50", "blue")) +
  expand_limits(y = 0)

Code
ggplot(mpg, aes(factor(cyl), cty)) +
  stat_multcomp(aes(fill = after_stat(p.value) < 0.05),
                size = 2.75,
                label.y = 11,
                vstep = 0.08,
                p.digits = 2,
                contrasts = "Dunnet") +
  stat_boxplot(width = 1/3) +
  scale_fill_manual(values = c("grey90", "lightblue")) +
  expand_limits(y = 0)

Code
ggplot(mpg, aes(factor(cyl), cty)) +
  stat_multcomp(aes(colour = factor(after_stat(letters.label))),
                label.type = "letters",
                size = 4) +
  stat_boxplot(width = 1/3) +
  expand_limits(y = 0)

Code
ggplot(mpg, aes(factor(cyl), cty)) +
  stat_multcomp(aes(colour = factor(after_stat(letters.label))),
                label.type = "letters",
                geom = "point",
                size = 3,
                adj.method.tag = 0) +
  stat_boxplot(width = 1/3) +
  expand_limits(y = 0)

Tip

Additional examples are available in the documentation of function stat_multcomp() and in the vignette of package ‘ggpmisc’.