Series title: Statistics: Changes since I was an undergrad
Abstract. I took my first course in Statistics 37 years ago. How we do statistics has changed dramatically since then. The amount of data we produce and analyse has also increased enormously. However, different research communities are making use of these new possibilities to very different extents. Even Biology curricula at different universities differ substantially in their emphasis on
quantitative” methods and
numerical/mathematical” literacy. All branches of Biology are becoming more quantitative. By reflecting on how advances in computers and computing science (and in the methods these advances have made possible) have opened a whole new way of approaching data analysis, I hope I will make you rethink your approach to data analysis.
If you are planning to participate next year in my course 526052 Using R for reproducible data analysis I recomend that you attend this series of talks as a gentle introduction to the subject. If you attend you can get credits, either through the DDPS or in on the regular seminar series in Plant Biology.
Place: Biocenter 3, Room 5405 (5th floor, the room in front of the stairs)
Two hours are reserved, for talk plus discussion.
Please, let me know by e-mail if you intend to participate.
Part 1: Increased easy of computation
Monday 17 November, 10:15-12:00
This first talk focuses on describing the advances in computing hardware and software and why they are relevant to data analysis. I will also briefly mention the now fashionable
Data Science” and
Big Data” concepts and the currently fuzzy boundary between statistics and programming.
Part 2: Advances in theory and methods
Tuesday 18 November, 13:15-15:00
If your statistical knowledge is limited to the “traditional” methods, I hope to introduce you to the new possibilities brought about by lifting the that used to prevent us from using computation intensive methods and analysing big data set. In contrasts, if you are a young researcher, well versed in modern methods, you will still hopefully find my talk interesting from the historical perspective of getting a glimpse of what limitations we had to deal with in the recent past, and how they influenced, and still influence, the traditional ways of treating biological data. This talk focuses mostly on statistical theory and methods. However, no specialised methods like those used in molecular biology or vegetation analysis will be described in this talk.
Part 3: Examples of modern methods using R
Wednesday 19 November, 13:15-15:00
In this talk I will present some examples of types of analyses that have become available to any biologist thanks to the increase in computing capacity and the development of new theory and methods that make use of these new possibilities. The aim not to teach you how to apply this methods, but instead to give an idea of what a broad array of methods are currently available to anyone with access to a run-of-the-mill personal computer, or failing this a cheap cloud server.
Part 4: Reproducible research and data analysis
Thursday 20 November, 13:15-15:00
This talk introduces the currently hot topic of research accountability and repeatability. Why is this openness needed, and how it can be achieved in practice, and how modern software and modern combinations of old software make it possible to achieve this goal rather painlessly even for complex data analyses. I will also reflect on the origins of these ideas in relation to computer programming around the concept of literate programming proposed by Donald Knuth in the early 1980’s.