A series of four seminars on Statistics (week 47)

Series title: Statistics: Changes since I was an undergrad

Abstract. I took my first course in Statistics 37 years ago. How we do statistics has changed dramatically since then. The amount of data we produce and analyse has also increased enormously. However, different research communities are making use of these new possibilities to very different extents. Even Biology curricula at different universities differ substantially in their emphasis on ” methods and /mathematical” literacy. All branches of Biology are becoming more quantitative. By reflecting on how advances in computers and computing science (and in the methods these advances have made possible) have opened a whole new way of approaching data analysis, I hope I will make you rethink your approach to data analysis.

If you are planning to participate next year in my course 526052 Using R for reproducible data analysis I recomend that you attend this series of talks as a gentle introduction to the subject. If you attend you can get credits, either through the DDPS or in on the regular seminar series in Plant Biology.

Place: Biocenter 3, Room 5405 (5th floor, the room in front of the stairs)

Two hours are reserved, for talk plus discussion.

Please, let me know by e-mail if you intend to participate.

Part 1: Increased easy of computation

Monday 17 November, 10:15-12:00

This first talk focuses on describing the advances in computing hardware and software and why they are relevant to data analysis. I will also briefly mention the now fashionable Science” and Data” concepts and the currently fuzzy boundary between statistics and programming.

Part 2: Advances in theory and methods

Tuesday 18 November, 13:15-15:00

If your statistical knowledge is limited to the “traditional” methods, I hope to introduce you to the new possibilities brought about by lifting the that used to prevent us from using computation intensive methods and analysing big data set. In contrasts, if you are a young researcher, well versed in modern methods, you will still hopefully find my talk interesting from the historical perspective of getting a glimpse of what limitations we had to deal with in the recent past, and how they influenced, and still influence, the traditional ways of treating biological data. This talk focuses mostly on statistical theory and methods. However, no specialised methods like those used in molecular biology or vegetation analysis will be described in this talk.

Part 3: Examples of modern methods using R

Wednesday 19 November, 13:15-15:00

In this talk I will present some examples of types of analyses that have become available to any biologist thanks to the increase in computing capacity and the development of new theory and methods that make use of these new possibilities. The aim not to teach you how to apply this methods, but instead to give an idea of what a broad array of methods are currently available to anyone with access to a run-of-the-mill personal computer, or failing this a cheap cloud server.

Part 4: Reproducible research and data analysis

Thursday 20 November, 13:15-15:00

This talk introduces the currently hot topic of research accountability and repeatability. Why is this openness needed, and how it can be achieved in practice, and how modern software and modern combinations of old software make it possible to achieve this goal rather painlessly even for complex data analyses. I will also reflect on the origins of these ideas in relation to computer programming around the concept of literate programming proposed by Donald Knuth in the early 1980’s.

Update: photobiology 0.3.12

Package photobiology is now at version 0.3.12, which has only a very small fix compared to version 0.3.11 that was in the repository for only a few hours. The main change from version 0.3.10 is the addition of three functions: wb2spct(), wb2tagged_spct() and wb2rect_spct(), which useful for annotating plots. The vignette was also updated with a very brief explanation of their use. Contrary to the last three updates, this time I have created and uploaded to the repository binaries for both R 3.0.x and R 3.1.x. I have also uploaded R 3.0.x binaries for all other packages that were out-of-date.

I will continue building binaries for R 3.0.x until R 3.2.0 is released. However, I do the development and testing under 3.1.1 and will start testing under the pre-release R 3.2.x. A quick check seems to indicate that R 3.0.3 does not trigger any problems.

Update to package photobiologygg (0.1.6)

This update adds significant new functionality and new documentation.

New function:

  • annotate_waveband() is based on ggplot2’s annotate() but has a parameter which accepts a waveband object simplifying the annotation of spectra plotted with ggplot.

The User Guide has been updated with several examples of the use of this function and also of the use of the new functions in photobiology ver 0.2.19.

This version of photobiologygg requires the newest version of photobiology.

Please report any problems or suggestions to me.

New package photobiologyAll

The new package photobiologyAll loads and imports all packages in the photobiology suite. Does not add any new functionality but adds convenience at the cost of increased memory use and loading time. Added on 25.04.2014: I have decided to remove package photobiologygg from those packages loaded and imported by photobiologyAll, because it in turn imports ggplot2 which may not be needed and is quite large. I will most probably change the name of package photobiologygg as it can be useful also in other contexts. Continue reading