From ResearchGate Q&A

I found this Q&A thread in ResearchGate:

What prevents you from using a p-value other than 0.05 as your statistical significance cut-off?

Even though there were already 84 answers, I added my own answer:

… for me choosing the critical p-value is not a statistical question. It is in the realm of the real-world effective cost of making the wrong decision. In research, it mainly relates to balancing “false positive” and “false negative” decisions. So, mostly informally, sometimes researchers set the critical value at 0.1 (10%) when replication is low. On the other hand when we have many replicates, we will find statistically significant differences that are biologically irrelevant. [Added only here: The 5 % tends to work not too badly for the number replicates used by many of us.]

In my opinion in every scientific publication, whatever critical value we use for discussing and interpreting the results, the actual p-values should always be given. Not doing so, just discards valuable information. Of course, one historical reason for not reporting actual values was the laborious calculations involved in obtaining values by interpolation when using printed tables.

The situation has far-reaching consequences when dealing with legal regulation compliance studies, or for environmental impact assessment, or safety. I would not want to take 1 in 20 risk of making the wrong decision concerning the possible lethal side-effect of a new medicine, while it might be acceptable to take that risk when comparing the new medicine to a currently used medicine known to be highly effective [but maybe not if comparing against a placebo]. In such cases we would want, rather than balance the risks of making false positive or false negative decisions, minimize one of them. In other words minimize the probability of the type of mistake that we need/want to avoid.

I have avoided statistical jargon, to make this understandable to more readers. Statisticians call these Type I and Type II errors, and there is plenty of literature on this. In any case I feel most comfortable with Tukey’s view on hypothesis testing, and his idea that we can NEVER ACCEPT the null hypothesis. We can either get evidence that A > B or A < B, and the alternative being that we have not enough evidence to decide which one is bigger. Of course in practice using power analysis, we can decide that we could have detected or not a difference that would be in practice relevant. However, this is conceptually very different to accepting that there is no difference or no effect.

[I would like to see students, and teachers, commenting on this problem, and how this fits with their understanding of the use of statistics in real situations. Please, just comment below. I will respond to any comments, and write a follow-up post on theĀ  effect of using different numbers of replicates on inferences derived from data].

ggplot2 version 1.0

ggplot2 version 1.0 has been released last week. Last March the author of this package declared a “feature freeze”. This means that no new functionality will be added in the future, although the package will continue to be maintained and kept working with future versions of R. No changes to our suite of packages or its documentation were triggered by this update.

In the future extensions to ggplot2 will be in separate packages. Two good examples are ggmap and ggtern. ggmap can be used to plot data (using regular ggplot syntax) on top of a map. ggtern adds functions for plotting ternary plots.

New updates to packages (photobiology 0.3.1 and several dependent packages)

The versions of the packages uploaded yesterday have several enhancements, and a few backwards incompatibilities (hopefully only in functions mostly used internally). The new versions are not yet optimized for speed, so for performance-critical code, stick for a while with earlier versions. The changes are many, and I hope give a more intuitive user interface. The new, more transparent way of doing calculations has a toll on performance. Consequently the “old” functions remain available for use in speed-critical code and for backwards compatibility. Some additional changes are planned for coming weeks.

The main improvement is the use of object-oriented programming. Now spectra are “S3” objects. This allows the definition specific versions of generic functions (e.g. print, range, etc.) and operators (e.g. +, -, *, /) for spectra. These are already working for spectral irradiance and spectral transmittance, but may still have some hidden bugs. Data for many additional filters has been added.

Most other packages, except those related to photoreceptors, have been also updated.

Please, report any problems and/or oddities to me.