Word cloud figure from LaTeX index entries

I created the word cloud on the cover of “Learn R as a Language” using an R script that takes as input the file for the book index, as generated when creating the PDF from the LaTeX source files. This input file contained quite a lot of additional information, like font changes and page numbers that needed to be stripped into a clean list of words. Only later I realized that it would have been easier to produce a cleaner word list to start with. So, I first present the code revised to work with a simpler word list. This is actually tested with the book files to work. If you want to do something similar for your own book, follow the revised code in first section below. If you want to see the “hacked-up” code I really used for the cover as included in the book, it is in the second section below.

Continue reading

Performance of package ‘photobiology’

In recent updates I have been trying to remove performance bottlenecks in the package. For plotting spectra with ‘ggspectra’ an obvious performance bottleneck has been the computation of color definitions from wavelengths. The solution to this problem was to use pre-computed color definitions in the most common case, that of human vision. Many functions and operators as well as assignments were repeatedly checking the validity of spectral data. Depending on the logic of the code, several of these checks were redundant. It is now possible to enable and disable checks both internally and in users’ code. This has been used within the package to avoid redundant or unnecessary checks when the logic of the computations ensures that results are valid.

In addition changes in some of the ‘tidyverse’ packages like ‘tibble’, ‘dplyr’, ‘vctrs’ and ‘rlang’ seem to have also improved performance of ‘photobiology’ very significantly. If we consider the time taken to run the test cases as an indication of performance, the gain has been massive, with runtime decreasing to nearly 1/3 of what it was a few months ago. This happened in spite of an increase in the number of test cases from about 3900 to 4270. Currently the 4270 test cases run on my laptop in 23.4 s. Updates ‘rlang’ (0.4.7) and/or ‘tibble’ (3.0.3) appearing this week in CRAN seem to have reduced runtime by about 30% compared to the previous versions.

The take home message is that even though there is a small risk of package updates breaking existing scripts, there is usually an advantage in keeping your installed R packages and R itself up to date. If some results change after an update it is important to investigate which one is correct, as it is both possible that earlier bugs have been fixed or new ones introduced. When needed it is possible, although slightly more cumbersome, to install superseded versions from the source-package archive at CRAN, which keeps every single version of the packages earlier available through CRAN. With respect to R itself, multiple versions can coexist on the same computer so it is not necessary to uninstall the version currently in use to test another one, either older or newer.

R 4.0.0

R version 4.0.0 has been released today. In the case of complex ggplots there is a noticeable improvement in performance. As is normal in CRAN, packages are automatically tested. Most of the packages I have developed are in CRAN, and you can easily see their check status at any time in the page where they are listed. At the moment there are warnings under OS-X affecting the building of vignettes for many different packages in addition to several of my own, all with the same message. I suspect this problem is related to ‘knitr’ or ‘roxygen2’.

R 3.6.0 is coming

A new version of R will be released on 2019-04-26. One significant change is the support of HCL colour definitios and palettes. Some defaults for colour palettes are changing (for the better) but small differences may visible in how plots look compared to earlier R versions. There is a post with a summary at the R blog, and a paper arXiv:1903.0649 describing why better palettes for data plotting can be defined.

Another significant change is that serialization format 3 becomes the default. As this format can be read only by R (>= 3.5.0) when sharing files created with save() or saveRDS(), and when sharing saved workspaces one needs to make sure the recipient is using a recent version of R or override this new default to create the files to be shared using the older serialization format 2.

All the packages I maintain should work correctly under R (= 3.6.0), but if you discover a problem, please, raise an issue at bitbucket within the repository of the affected package.

Benchmarking function sun_angles() [updated]

As far as I know there are in CRAN four R packages implementing the computations for the position of the sun and times of sunrise and sunset: ‘photobiology’, ‘fishmethods’, ‘solartime’ and ‘suncalc’.

The functions sun_angles() and day_night() from package ‘photobiology’ use Meeus equations as used by NOAA Solar Calculator https://www.esrl.noaa.gov/gmd/grad/solcalc/ which could be more precise than those in NOAA’s Excel worksheet which implement a simplified version of the Meeus equations especially for far into the past or far into the future calculations. The approximations based on Meuus equations are very good for years between 1800 and 2100 and results should still be sufficiently accurate for the range from -1000 to 3000 as long as the computation of Julian dates is correct. The Excel implementation is only valid for dates between 1901 and 2099 because of how Julian dates are computed in Excel.

Function astrocalc4r() from package ‘fishmethods’ also implements Meeus equations (the authors work at NOAA). Function computeSunPosition() from package ‘solartime’ uses unspecified equations and function getSunlightPosition() is an R interface to the ‘suncalc.js’ library, part of the ‘SunCalc.net’ project <http://suncalc.net>.

Function computeSunPosition() from package ‘solartime’ uses unspecified equations and function getSunlightPosition() is an R interface to the ‘suncalc.js’ library, part of the ‘SunCalc.net’ project <http://suncalc.net>.

UPDATED on 2019-04-24

I have noticed significant differences in the values returned by equivalent functions from different packages. Up to now the tests on the functions of my own package ‘photobiology’ have revealed only very small mismatches to the NOAA Solar Calculator. These small errors, noticeable for dates far from the present, were due to the use of base R’s julian() function, which is not designed to be precise enough for astronomical calculations. The code now in the repository at Bitbucket has been revised to use Meuus’ algorithm for the calculation of Julian days removing this source of  small discrepancies.

In contrast, while testing ‘photobiology’ against other packages, I seem to have found a bug in function astrocalc4r() from R package ‘fishmethods’.

A minimal example follows:

By only changing the hour passed as argument different times for sunrise, sunset and daylength are returned even though the day is the same. The differences are larger at high than at low latitudes. The maximum difference for the example above is 1/4 h for daylength. Comparison against the NOAA Solar Calculator shows even larger differences.

(The bug has been reported to the maintainer of package ‘fishmethods’.)

Continue reading