Word cloud figure from LaTeX index entries

I created the word cloud on the cover of “Learn R as a Language” using an R script that takes as input the file for the book index, as generated when creating the PDF from the LaTeX source files. This input file contained quite a lot of additional information, like font changes and page numbers that needed to be stripped into a clean list of words. Only later I realized that it would have been easier to produce a cleaner word list to start with. So, I first present the code revised to work with a simpler word list. This is actually tested with the book files to work. If you want to do something similar for your own book, follow the revised code in first section below. If you want to see the “hacked-up” code I really used for the cover as included in the book, it is in the second section below.

Continue reading

Performance of package ‘photobiology’

In recent updates I have been trying to remove performance bottlenecks in the package. For plotting spectra with ‘ggspectra’ an obvious performance bottleneck has been the computation of color definitions from wavelengths. The solution to this problem was to use pre-computed color definitions in the most common case, that of human vision. Many functions and operators as well as assignments were repeatedly checking the validity of spectral data. Depending on the logic of the code, several of these checks were redundant. It is now possible to enable and disable checks both internally and in users’ code. This has been used within the package to avoid redundant or unnecessary checks when the logic of the computations ensures that results are valid.

In addition changes in some of the ‘tidyverse’ packages like ‘tibble’, ‘dplyr’, ‘vctrs’ and ‘rlang’ seem to have also improved performance of ‘photobiology’ very significantly. If we consider the time taken to run the test cases as an indication of performance, the gain has been massive, with runtime decreasing to nearly 1/3 of what it was a few months ago. This happened in spite of an increase in the number of test cases from about 3900 to 4270. Currently the 4270 test cases run on my laptop in 23.4 s. Updates ‘rlang’ (0.4.7) and/or ‘tibble’ (3.0.3) appearing this week in CRAN seem to have reduced runtime by about 30% compared to the previous versions.

The take home message is that even though there is a small risk of package updates breaking existing scripts, there is usually an advantage in keeping your installed R packages and R itself up to date. If some results change after an update it is important to investigate which one is correct, as it is both possible that earlier bugs have been fixed or new ones introduced. When needed it is possible, although slightly more cumbersome, to install superseded versions from the source-package archive at CRAN, which keeps every single version of the packages earlier available through CRAN. With respect to R itself, multiple versions can coexist on the same computer so it is not necessary to uninstall the version currently in use to test another one, either older or newer.

R 4.0.0

R version 4.0.0 has been released today. In the case of complex ggplots there is a noticeable improvement in performance. As is normal in CRAN, packages are automatically tested. Most of the packages I have developed are in CRAN, and you can easily see their check status at any time in the page where they are listed. At the moment there are warnings under OS-X affecting the building of vignettes for many different packages in addition to several of my own, all with the same message. I suspect this problem is related to ‘knitr’ or ‘roxygen2’.