R tips 3: ggpmisc adds new stats to ‘ggplot2’

Aim of the package

  1. Make it possible to use the grammar of graphics in some cases for which until now ad-hoc solutions were needed.
  2. Show how easy it can be to write new statistics for ‘ggplot2 (>= 2.0.0)’.

Case 1 (Stackoverflow question)

“Adding a 3rd order polynomial and its equation to a ggplot in r”

Re-stated problem

“Adding a label with the R^2 or adjusted R^2 value from any lm() fit for each group and panel of a ggplot”

“Adding a polynomial of any degree and its equation for each group and panel of a ggplot”

As a follow up to a question by a colleague about the answer she found at Stackoverflow not working for a third degree polynomial, I wrote stat_poly_eq() as a general solution to the problem. For simplicity I used a different approach than in the Stackoverflow answer: I used package ‘polynom’. (Extending this stat to handle BIC and AIC or in fact any value which can be extracted or computed from the fitted model object should be just a simple edit.)

[code lang=”r”]
library(ggplot2)
library(ggpmisc)

formula = y ~ x
ggplot(cars, aes(speed, dist)) + geom_point() +
geom_smooth(formula = formula, method = “lm”) +
stat_poly_eq(formula = formula, aes(label = ..eq.label..), parse = TRUE)
[/code]

Example of use of ggpmisc::stat_poly_eq()

Case 2 (Stackoverflow question)

“Plotting a simple time series in ggplot.”

Restated problem

“Plotting time series in ggplot: converting a time series into a data frame suitable for plotting with ggplot.”

Function try_data_frame() does this conversion using as its core function xts:try.xts() to first convert if possible its argument into a xts object, and then converting the xts object into a data frame.

[code lang=”r”]
library(ggplot2)
library(ggpmisc)

ggplot(try_data_frame(lynx), aes(time, V.lynx)) + geom_line()
[/code]

ggpmisc::try_data_frame() example
ggpmisc::try_data_frame() example

Case 3 (Stackoverflow question)

“add a curve that fits the peaks from a plot in R?”

Restated problem

“Finding peaks and valleys and labelling them in each group and panel of a ggplot.”

The writing of these versions of stat_peaks() and stat_valleys() was inspired by a discussion in ggrepel’s issues area at Github. These ‘ggplot’ statistics are built on top of splus2R::peaks(). (ppc.peaks: In the next version I will make possible also the use of ppc::ppc.peaks().)

[code lang=”r”]
library(ggplot2)
library(ggpmisc)

ggplot(try_data_frame(lynx), aes(time, V.lynx)) +
geom_line() +
stat_peaks(geom = “line”, linetype = “dashed”, color = “orange”, size = rel(1)) +
stat_peaks(geom = “line”, linetype = “dashed”, color = “red”, size = rel(1),
ignore_threshold = 0.7)
[/code]

ggpmisc::stat_peaks() example

[code lang=”r”]
library(ggplot2)
library(ggpmisc)

ggplot(beaver1, aes(time, temp)) +
geom_point(aes(color = factor(activ))) +
geom_line() +
stat_peaks(geom = “text”, color = “red”,
x.label.fmt = “%04d”, span = 5,
angle = 90, hjust = -0.1) +
stat_valleys(geom = “text”, color = “blue”,
x.label.fmt = “%04d”, span = 5,
angle = 90, hjust = 1.1) +
ylim(36.25, 37.75) +
facet_grid(~day, scales = “free_x”, space = “free_x”,
labeller = “label_both”)
[/code]

stat-peaks-example

Case 4

“Peaks in a time series.”

[code lang=”r”]
library(ggplot2)
library(ggpmisc)

ggplot(try_data_frame(lynx), aes(time, V.lynx)) +
geom_line() +
stat_peaks(geom = “rug”, color = “red”) +
stat_peaks(geom = “point”, color = “red”) +
stat_valleys(geom = “rug”, color = “blue”) +
stat_valleys(geom = “point”, color = “blue”)
[/code]

Time series plotted as ggplot annotated with peaks and valleys
Time series plotted as ggplot annotated with peaks and valleys

Case 5

“Custom label formatting for peaks and valleys”

[code lang=”r”]
library(ggplot2)
library(ggpmisc)
library(xts)

ggplot(try_data_frame(AirPassengers), aes(time, V.AirPassengers)) +
geom_line() + stat_peaks(x.label.fmt = “%b”, geom = “text”, angle = 90,
hjust = -0.1, color = “red”, span = 3) +
geom_line() + stat_valleys(x.label.fmt = “%b”, geom = “text”, angle = 90,
hjust = 1.1, color = “blue”, span = 3) +
scale_x_datetime(date_labels = “%b %y”, date_breaks = “1 year”) +
ylim(0,700)
[/code]

passengers-peaks

Case 6

“Edited labels”

[code lang = “r”]
library(ggplot2)
library(ggpmisc)

ggplot(try_data_frame(ldeaths), aes(time, V.ldeaths)) +
stat_peaks(geom = “vline”, color = “red”, span = 11,
linetype = “dashed”) +
stat_peaks(x.label.fmt = “%b %Y”, y.label.fm = “%4.0f”,
geom = “label”,
color = “red”, span = 11, vjust = -0.2,
aes(label = paste(..y.label.., “deaths\n in”,
..x.label..))) +
geom_line() +
scale_x_datetime(date_labels = “%b %y”, date_breaks = “1 year”) +
ylim(0,4300)
[/code]

ggpmisc::stat_peaks() example
ggpmisc::stat_peaks() example

Leave a Reply

Your email address will not be published. Required fields are marked *