In this page code chunks are “folded” so as to decrease the clutter when searching for examples. Above each plot you will find a small triangle followed by “Code”. Clicking on the triangle “unfolds” the code chunk making visible the R code used to produce the plot. Except for the loading of packages shown in section Preliminaries code examples are in most cases self contained. When they are not, this is indicated by a comment.
For simplicity, whenever possible I use base R functions instead of contributed R packages. For those packages used only in specific examples I use colon notation to indicate the ‘package’.
All “words” defined in base R or in extension packages are linked to the corresponding HTML-rendered help pages.
The code in the chunks can be copied by clicking on the top right corner, where an icon appears when the mouse cursor hovers over the code listing.
1 Data labels and plot annotations
Data labels add textual information directly related to individual data points (shown as glyphs). Text position in this case is dependent on the scales used to represent data points. Text is usually displaced so that it does not occlude the glyph representing the data point and when the link to the data point is unclear, this link is signaled with a line segment or arrow. Data labels are distinct from annotations in that they contribute directly to the representation of data on a plot or map. The typical case for volcano and quadrant plots is to highlight individual points by labelling with an abbreviation of a gene name or metabolite name.
Annotations differ from data labels, in that their position is decoupled from their meaning. In the case of annotations the designer of a data visualization has more freedom when deciding their location in an illustration, as long as they do not occlude features used to describe data. In the case of volcano and quadrant plots, a typical use of annotations is to indicate the number of observations in each quadrant of the plot.
2 Scales and out-of-bounds observations
Volcano and quadrant plots normally have symetrical positive and negative limits on axes and use transformations for the scales.
3 Preliminaries
The code used is shown on-demand above each plot and can be copied. We first load the packages we will use.
When package ‘ggpmisc’ is loaded and attached, packages ‘ggpp’ and ‘ggplot2’ are also attached. The only function from ‘ggplot2’ that is redefined by ‘ggpp’ is annotate()
, which remains backwards compatible with ‘ggplot2’.
4 Quadrant plots
4.1 Simple examples using ‘ggpp’
Quadrant plots are scatterplots with the origin (0, 0) at the center of the plotting region. In the first two examples we use artificial data and a geom and a stat from package ‘ggpp’.
A simple quadrant plot can add two lines to highlight the four quadrants and annotations with the number of observations per quadrant. We expand the y axis limits to enusre that the count annotations do not overlap the observations.
Code
ggplot(my.data, aes(x, y)) +
geom_quadrant_lines(colour = "red") +
stat_quadrant_counts(colour = "red") +
geom_point() +
expand_limits(y = c(-260, 260))
Starting with ‘ggpp’ 0.5.3 other labels are available, percents (pc.label
), decimal fractions (dec.label
), and fractions (fr.label
). We show fractions.
Code
ggplot(my.data, aes(x, y)) +
geom_quadrant_lines(colour = "red") +
stat_quadrant_counts(aes(label = after_stat(fr.label)), colour = "red") +
geom_point() +
expand_limits(y = c(-260, 260))
We create panels or facets for the groups, and the counts are computed separately within each plot panel.
Code
ggplot(my.data, aes(x, y, colour = group)) +
geom_quadrant_lines() +
stat_quadrant_counts(geom = "label_npc") +
geom_point() +
expand_limits(y = c(-260, 260)) +
facet_wrap(~group)
stat_quadrant_counts()
ignores groups, it always shows the total counts per quadrant.
Code
ggplot(my.data, aes(x, y, colour = group)) +
geom_quadrant_lines() +
stat_quadrant_counts() +
geom_point() +
expand_limits(y = c(-260, 260))
Occasionally we may want to pool the counts along x or y.
Code
ggplot(my.data, aes(x, y)) +
geom_quadrant_lines(pool.along = "x") +
stat_quadrant_counts(pool.along = "x") +
geom_point() +
expand_limits(y = c(-260, 260))
Occasionally we may want to relocate the center.
Code
ggplot(my.data, aes(x, y)) +
geom_quadrant_lines(xintercept = 25, yintercept = -50) +
stat_quadrant_counts(xintercept = 25, yintercept = -50) +
geom_point() +
expand_limits(y = c(-260, 260))
4.2 Realistic examples using ‘ggpp’ and ‘ggpmisc’
For the examples we will use a subset of RNAseq data for Arabidopsis, available in package ‘ggpmisc’. We show here the first 10 rows.
Code
head(quadrant_example.df, n = 10)
tag gene outcome.x outcome.y logFC.x logFC.y genotype
1 AT5G11060 TIC56 0 0 -0.17685517 -0.32956762 Ler
2 AT3G01280 ATWRKY48 0 0 -0.06471884 0.07771315 Ler
3 AT5G06320 ANY1 0 0 -0.35200684 -0.04331339 Ler
4 AT1G05850 AP1M1 0 0 -0.40284744 -0.04505435 Ler
5 AT2G22300 HSA32 0 0 -0.34482142 -0.11185095 Ler
6 AT2G16070 UBQ11 0 0 -0.22328946 -0.23210780 Ler
7 AT1G49820 UXS6 0 0 -0.28420119 -0.16488196 Ler
8 AT3G22200 ATPUB17 -1 -1 -0.59734828 -0.81294918 Ler
9 AT2G26170 ATYLMG1-2 0 0 0.13251399 0.03915004 Ler
10 AT1G03310 EMB3101 0 -1 -0.02223403 -0.90198521 Ler
Code
ggplot(subset(quadrant_example.df,
xy_outcomes2factor(outcome.x, outcome.y) != "none"),
aes(logFC.x, logFC.y,
colour = outcome2factor(outcome.x),
fill = outcome2factor(outcome.y))) +
geom_quadrant_lines(linetype = "dotted") +
stat_quadrant_counts(size = 3, colour = "white", fontface = "bold") +
geom_point(shape = "circle filled", size = 2) +
scale_x_logFC(name = "Transcript abundance for x%unit") +
scale_y_logFC(name = "Transcript abundance for y%unit",
expand = expansion(mult = 0.15)) +
scale_colour_outcome() +
scale_fill_outcome() +
theme_dark()
Code
ggplot(subset(quadrant_example.df,
xy_outcomes2factor(outcome.x, outcome.y) != "none"),
aes(logFC.x, logFC.y,
colour = outcome2factor(outcome.x),
fill = outcome2factor(outcome.y))) +
geom_quadrant_lines(linetype = "dotted") +
stat_quadrant_counts(aes(label = after_stat(pc.label)),
size = 3, colour = "white", fontface = "bold") +
geom_point(shape = "circle filled", size = 2) +
scale_x_logFC(name = "Transcript abundance for x%unit") +
scale_y_logFC(name = "Transcript abundance for y%unit",
expand = expansion(mult = 0.15)) +
scale_colour_outcome() +
scale_fill_outcome() +
theme_dark()
To plot in separate panels those observations that are significant along both x and y axes, x axis, y axis, or none, with quadrants merged takes more effort. We first define two helper functions to add counts and quadrant lines to each of the four panels.
Code
# we define two functions that call stats with different arguments based on the quadrant
all_quadrant_counts <- function(...) {
list(
stat_quadrant_counts(data = . %>% filter(outcome.xy.fct == "xy"), ...),
stat_quadrant_counts(data = . %>% filter(outcome.xy.fct == "x"), pool.along = "y", ...),
stat_quadrant_counts(data = . %>% filter(outcome.xy.fct == "y"), pool.along = "x", ...),
stat_quadrant_counts(data = . %>% filter(outcome.xy.fct == "none"), quadrants = 0L, ...)
)
}
all_quadrant_lines <- function(...) {
list(
geom_hline(data = data.frame(outcome.xy.fct = factor(c("xy", "x", "y", "none"),
levels = c("xy", "x", "y", "none")),
yintercept = c(0, NA, 0, NA)),
aes(yintercept = yintercept),
na.rm = TRUE,
...),
geom_vline(data = data.frame(outcome.xy.fct = factor(c("xy", "x", "y", "none"),
levels = c("xy", "x", "y", "none")),
xintercept = c(0, 0, NA, NA)),
aes(xintercept = xintercept),
na.rm = TRUE,
...)
)
}
# we use them to plot four panels, each with a different subset of data
# we first convert the outcomes from the statistical test into factors
quadrant_example.df %>%
mutate(.,
outcome.x.fct = outcome2factor(outcome.x),
outcome.y.fct = outcome2factor(outcome.y),
outcome.xy.fct = xy_outcomes2factor(outcome.x, outcome.y)) %>%
ggplot(., aes(logFC.x, logFC.y, colour = outcome.x.fct, fill = outcome.y.fct)) +
geom_point(shape = 21) +
all_quadrant_lines(linetype = "dotted") +
all_quadrant_counts(size = 3, colour = "white") +
scale_x_logFC(name = "Transcript abundance for x%unit") +
scale_y_logFC(name = "Transcript abundance for y%unit", expand = expansion(mult = 0.2)) +
scale_colour_outcome() +
scale_fill_outcome() +
facet_wrap(~outcome.xy.fct) +
theme_dark()
Addition of data labels is best automated by using statistics from ‘ggpp’ that filter labels based on the local density of observations, together with the repulsive geometries from ‘ggrepel’.
Code
ggplot(subset(quadrant_example.df,
xy_outcomes2factor(outcome.x, outcome.y) != "none"),
aes(logFC.x, logFC.y,
label = gene)) +
geom_quadrant_lines(linetype = "dotted") +
stat_dens2d_labels(geom = "text_repel",
size = 2,
position = position_nudge_centre(x = 0.1,
y = 0.1,
direction = "radial"),
keep.fraction = 0.3,
keep.number = 30,
keep.these = c("AtMC9", "HY5", "HYH"),
min.segment.length = 0,
fontface = "bold") +
geom_point(shape = "circle open", size = 2) +
stat_quadrant_counts(aes(label = after_stat(count.label)), size = 3) +
scale_x_logFC(name = "Transcript abundance for x%unit") +
scale_y_logFC(name = "Transcript abundance for y%unit",
expand = expansion(mult = 0.15)) +
scale_fill_outcome() +
theme_bw()
Warning: No shared levels found between `names(values)` of the manual scale and the
data's fill values.
5 Volcano plots
5.1 Simple examples using ‘ggpp’
We use here a subset of RNAseq data from Arabidopsis. The predefined scales simplify the plotting by handling the tick mark labels and axis labels, but also by setting the out of bounds parameters of the scales so that out-of-bounds observations are “shrunk” to the scale limit. All this can be done also manually but is rather tedious. The scales have parameters that allow the defaults to be changed.
In this example, the counts are for the number of upregulated and downrelated genes.
Code
nrow(volcano_example.df)
[1] 1218
Code
head(volcano_example.df, n = 10)
tag gene outcome logFC PValue genotype
1 AT1G01040 ASU1 0 -0.15284466 0.35266997 Ler
2 AT1G01290 ASG4 0 -0.30057068 0.05471732 Ler
3 AT1G01560 ATSBT1.1 0 -0.57783350 0.06681310 Ler
4 AT1G01790 AtSAM1 0 -0.04729662 0.74054263 Ler
5 AT1G02130 AtTRM82 0 -0.14279891 0.29597519 Ler
6 AT1G02560 PRP39 0 0.23320752 0.07487043 Ler
7 AT1G02910 LIP2 0 0.12080975 0.39311887 Ler
8 AT1G03190 KMS2 0 0.09426074 0.57671252 Ler
9 AT1G03840 MTSF1 0 -0.62652441 0.13968485 Ler
10 AT1G04100 MAC5A 0 0.28060062 0.26571213 Ler
Code
ggplot(volcano_example.df,
aes(logFC, PValue, colour = outcome2factor(outcome))) +
geom_point() +
scale_x_logFC(name = "Transcript abundance%unit") +
scale_y_Pvalue() +
scale_colour_outcome() +
stat_quadrant_counts(data = function(x) {subset(x, outcome != 0)}) +
theme_bw()
Code
ggplot(volcano_example.df,
aes(logFC, PValue, colour = outcome2factor(outcome))) +
geom_point() +
scale_x_logFC(name = "Transcript abundance%unit") +
scale_y_Pvalue() +
scale_colour_outcome() +
stat_quadrant_counts() +
stat_group_counts(label.x = "left", label.y = "bottom") +
stat_group_counts(aes(label = after_stat(pc.label)), label.y = "bottom") +
theme_bw()
Code
ggplot(volcano_example.df,
aes(logFC, PValue, colour = outcome2factor(outcome, n.levels = 2))) +
geom_point() +
scale_x_logFC(name = "Transcript abundance%unit") +
scale_y_Pvalue() +
scale_colour_outcome(values = "outcome:de") +
stat_quadrant_counts(data = function(x) {subset(x, outcome != 0)}) +
theme_bw()
Code
ggplot(volcano_example.df,
aes(logFC, PValue, colour = outcome2factor(outcome, n.levels = 2))) +
geom_point() +
scale_x_logFC(name = "Transcript abundance%unit") +
scale_y_Pvalue() +
scale_colour_outcome(values = "outcome:de") +
stat_quadrant_counts(data = function(x) {subset(x, outcome != 0)}) +
stat_group_counts(label.x = "left", label.y = "bottom") +
stat_group_counts(aes(label = after_stat(pc.label)), label.y = "bottom") +
theme_bw()
We here label those genes in regions of the plot with low local density of observations.
Code
ggplot(volcano_example.df,
aes(logFC,
PValue, colour = outcome2factor(outcome, n.levels = 2),
label = gene)) +
geom_point() +
stat_dens2d_labels(geom = "text_repel",
size = 2,
colour = "black",
position = position_nudge_centre(x = 0.1,
y = 0.1,
direction = "radial"),
keep.fraction = 0.03, # 3% of data points
min.segment.length = 0) +
scale_x_logFC(name = "Transcript abundance%unit") +
scale_y_Pvalue() +
scale_colour_outcome(values = "outcome:de", de.colour = "darkgreen") +
stat_quadrant_counts(data = function(x) {subset(x, outcome != 0)},
aes(label = after_stat(count.label))) +
theme_bw()
Code
ggplot(volcano_example.df,
aes(logFC,
PValue, colour = outcome2factor(outcome),
label = gene)) +
geom_point() +
stat_dens2d_labels(geom = "text_repel",
size = 2,
colour = "black",
position = position_nudge_centre(x = 0.1,
y = 0.1,
direction = "radial"),
keep.fraction = 0.03, # 3% of data points
min.segment.length = 0) +
scale_x_logFC(name = "Transcript abundance%unit") +
scale_y_Pvalue() +
scale_colour_outcome() +
stat_quadrant_counts(data = function(x) {subset(x, outcome != 0)},
aes(label = after_stat(count.label))) +
theme_bw()
We here label only genes listed by name in keep.these
.
Code
ggplot(volcano_example.df,
aes(logFC,
PValue, colour = outcome2factor(outcome, n.levels = 2),
label = gene)) +
geom_point() +
stat_dens2d_labels(geom = "text_s",
colour = "red",
size = 2,
position = position_nudge_centre(x = 0.9,
y = 0.9,
direction = "radial"),
keep.fraction = 0, # 0% based on density
keep.these = c("IAA30", "IAA18"),
min.segment.length = 0,
fontface = "bold") +
scale_x_logFC(name = "Transcript abundance%unit") +
scale_y_Pvalue() +
scale_colour_outcome(values = "outcome:de", de.colour = "black") +
stat_quadrant_counts(data = function(x) {subset(x, outcome != 0)},
aes(label = after_stat(count.label))) +
theme_bw()
We here label only genes with names starting with “IAA” by passing an anonymous function to keep.these
. The variable mapped to the label
aesthetic is passed as its argument and the labels returned by the function will be selected.
Code
ggplot(volcano_example.df,
aes(logFC,
PValue, colour = outcome2factor(outcome, n.levels = 2),
label = gene)) +
geom_point() +
stat_dens2d_labels(geom = "text_s",
colour = "red",
size = 2,
position = position_nudge_centre(x = 0.9,
y = 0.9,
direction = "radial"),
keep.fraction = 0, # 0% based on density
keep.these = function(x) {grepl("^IAA", x)},
min.segment.length = 0,
fontface = "bold") +
scale_x_logFC(name = "Transcript abundance%unit") +
scale_y_Pvalue() +
scale_colour_outcome(values = "outcome:de", de.colour = "black") +
stat_quadrant_counts(data = function(x) {subset(x, outcome != 0)},
aes(label = after_stat(count.label))) +
theme_bw()
We here label only genes for which \(P < 1 \times 10^{-10}\). In this case we pass, in addition to the anonymous function to keep.these
, as argument to parameter these.target
the name of the variable in data our functions expects as its first argument.
The names to pass to these.target
are those of aesthetics rather than those the columns in the user’s data frame. In ‘ggplot2’ only variables mapped to aesthetics are visible to stats.
Code
ggplot(volcano_example.df,
aes(logFC,
PValue, colour = outcome2factor(outcome, n.levels = 2),
label = gene)) +
geom_point() +
stat_dens2d_labels(geom = "text_repel",
colour = "red",
size = 2,
position = position_nudge_centre(x = 0.9,
y = 0.9,
direction = "radial"),
keep.fraction = 0,
these.target = "y",
keep.these = function(x) {x > 10}, # -log10(PValue) in data
min.segment.length = 0,
max.overlaps = Inf) + # force all labels displayed
scale_x_logFC(name = "Transcript abundance%unit") +
scale_y_Pvalue() +
scale_colour_outcome(values = "outcome:de", de.colour = "black") +
stat_quadrant_counts(data = function(x) {subset(x, outcome != 0)},
aes(label = after_stat(count.label))) +
theme_bw()
The examples above use only one approach to the selection of genes to label, but they can be combined. Here we combine a selection based on density and on P-value.
Code
ggplot(volcano_example.df,
aes(logFC,
PValue, colour = outcome2factor(outcome, n.levels = 2),
label = gene)) +
geom_point() +
stat_dens2d_labels(geom = "text_repel",
size = 2,
position = position_nudge_centre(x = 0.1,
y = 0.1,
direction = "radial"),
keep.fraction = 0.03,
these.target = "y",
exclude.these = function(x) {x < 5}, # -log10(PValue) in data
min.segment.length = 0) +
scale_x_logFC(name = "Transcript abundance%unit") +
scale_y_Pvalue() +
scale_colour_outcome(values = "outcome:de", de.colour = "darkgreen") +
stat_quadrant_counts(data = function(x) {subset(x, outcome != 0)},
aes(label = after_stat(count.label)),
colour = "darkgreen") +
theme_bw()