Introduction to Mixed Effects Models

Concepts and examples in R with package ‘nlme’

R
model fitting
Author

Pedro J. Aphalo

Published

2025-01-23

Modified

2025-01-24

Abstract
An introduction to linear- and non-linear mixed effects models (LME and NLME) based on overheads. The concepts of fixed and random effects are presentsed using examples. Step by step model building is illustrated with a data set of the growth of individual leaves of birch sapplings. These same data are use to illustrate both LME and NLME fits.
Keywords

data analysis

Note

The content of this page is only lightly edited from overhead transparencies I used in teaching a course on longitudinal data between 2001 and 2005 at the University of Jyväskylä, Finland. Nowadays, package ‘lme4’ provides an alternative approach to the fitting of LME. Package ‘nlme’ does however support the fitting of both LMEs and NLMEs, and is used in these pages. Package ‘nlme’ is one of the recommended packages included in the R installation.

Package ‘ymp236’ from 2004 will be soon available through R-Universe. In 20 years the expectations about doccumentation have changed, and although already working, it needs a face-lift before publication.

Tip

The code in the chunks can be copied by clicking on the top right corner, where an icon appears when the mouse cursor hovers over the code listing.

1 Introduction

Compared to linear models (LM), linear mixed effects models (LME) are much more flexible but computationally more demanding. They replace with advantages the variations of linear models earlier frequently used to analyse repeated measures data. They provide ways to describe as part of the fitted model not only the grouping structure of the data, but also heterogeneity of variance, different types of cross correlations and autocorrelations. In general, because of the more numerous options to choose from their use is more involved. However, the simplicity of linear models is a valid assumption only for simple data sets. So, the decision is actually done when designing an experiment, even if not always explicitly.

The concept of random and fixed effects is applicable to different types of models, not just to linear models. So models that are non linear in their parameters, can contain either only fixed effects (NLM) or both fixed and random effects (NLME). The same applies to generalised linear models (GLM) vs. generalised linear mixed models (GLMM), and so on.

In the case of linear models (LM) an analytical solution of the fitting problem is available, and consequently the model fitting is computationally easy and only with extremely difficult cases the solution can be numerically unstable. LMEs, NLM, NLME, etc., are fitted by numerical approximation, by iteratively improving an initial guess. Iteration increases the computation effort needed and can rather frequently require the use of different initial values or even switch to a different approximation/optimization algorithm to achieve convergence onto a valid answer. Longitudinal data include time series (no replication at each time point) and repeated measures data (with replication at each time point). Mixed effect models are useful not only for repeated measures but more generally in many other cases in which observations are grouped, such as designs in blocks, nested arrangements of treatments or classification factors. In the case of LME and NLME there is no requirement of equally spaced time points or even that all replicates are measured at each or the same time points. This is in many cases a significant advantage compared to classical approaches to the analysis of repeated measures.

2 Random and fixed effects

2.1 The concept

  • Fixed effects are parameters associated with the entire population or with certain repeatable levels of experimental factors.

  • Random effects are associated with individual experimental units drawn at random from a population.

A model with both fixed effects and random effects is called a mixed-effects model.

3 Grouped data

Mixed-effects models are primarily used to describe relationships between a response variable and some covariates in data that are grouped according to one or more classification factors.

Examples of grouped data are: longitudinal data, repeated measures data, and block designs.

By associating common random effects to observations sharing the same level of a classification factor, mixed-effects models represent the covariance structure induced by grouping of the data.

4 An example of repeated measures

We first load the packages that we will use.

library(ymp236)
library(nlme)
library(lattice)
library(ggplot2)
theme_set(theme_bw())

We start with a simple example of data that do not include a covariate or treatment, but will help explain grouping and random effects. These are data from a stress test of railway rails. Six rails were chosen at random and tested three times each. The data are included in package ‘nlme’ and used also a book that describes mixed effects models and the ‘lnme’ package (Pinheiro and Bates 2000).

The response variable is the time it took a certain ultrasonic wave to travel the length of the rail. The only setting that changes between observations is the rail. The observations are arranged in a one-way classification (according to the rail on which the observation was done).

The quantities of interest were: 1) the average travel time of the wave for a typical rail (The expected travel time); 2) the variation in travel time among rails (the between-rail variability); and 3) the average variation in the observed travel times for a single rail (the within-rail variability).

The data look like this: Travel time is in nanoseconds, after subtracting 36,100 nanoseconds.

plot(Rail)

Data from a one-way classification like the rails example can in principle be analysed either with a fixed-effects model or with a random-effects model.

The choice between the two types of models is done according to whether we wish to make inferences about those particular “levels” of the classification factor that were used in the experiment or to make inferences about the population from which these “levels” were drawn.

Note

The choice of using a fixed model or mixed effects one containing also random effects is done based on the design of the data and the aims of the study. Below we compare both approaches on the same data set so as to illustrate their differences. This is not the approach to follow for a real data analysis.

So, for illustration, we first ignore the grouping structure of the data and assume the simple model \[ y_{ij} = \beta + \epsilon_{ij}, \qquad i=1,\ldots,M, \qquad j=1,\ldots,n_i, \] where \(y_{ij}\) is the observed travel time for observation \(j\) on rail \(i\), and \(\beta\) is the mean travel time across the population of rails being sampled, and the \(\epsilon_{ij}\) are the independent \(\mathcal{N}(0,\sigma^2)\) error terms. The number of rails is \(M\) and the number of observations on rail \(i\) is \(n_i\).

In this case \(M=6\), \(n_1 = n_2 = \cdots = n_6 = 3\). The total number of observations \(N = \sum_{i=1}^M n_i = 18\).

We use lm() to fit a linear model, i.e., a model describing fixed effects and an overall error term from the residuals.

fm1Rail.lm <- lm( travel ~ 1, data = Rail )
summary(fm1Rail.lm)

Call:
lm(formula = travel ~ 1, data = Rail)

Residuals:
   Min     1Q Median     3Q    Max 
-40.50 -16.25   0.00  18.50  33.50 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   66.500      5.573   11.93  1.1e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 23.65 on 17 degrees of freedom

The fitted object contains the parameter estimates \(\hat{\beta}=66.5\) and \(\hat{\sigma} = 23.65\).

Box plots of the residuals show the problem of this approach: there are two distinct sources of variation. A smaller one between repeated measurements on each rail and a larger one between different rails.

combined.df <- cbind(Rail, resid = residuals(fm1Rail.lm))
ggplot(combined.df, aes(resid, Rail)) +
  geom_boxplot()

We can allow the mean of each rail to be represented by one parameter in a fixed-effects model. \[\begin{equation*} y_{ij} = \beta_i + \epsilon_{ij}, \qquad i=1,\ldots,M, \qquad j=1,\ldots,n_i, \end{equation*}\] where the \(\beta_i\) represents the mean travel time of rail \(i\) and the errors \(\epsilon_{ij}\) are assumed to be independently distributed as \(\mathcal{N}(0,\sigma^2)\).

We can use lm() again to fit this new model.

fm2Rail.lm <- lm(travel ~ Rail - 1, data = Rail)
summary(fm2Rail.lm)

Call:
lm(formula = travel ~ Rail - 1, data = Rail)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.6667 -1.0000  0.1667  1.0000  6.3333 

Coefficients:
      Estimate Std. Error t value Pr(>|t|)    
Rail2   31.667      2.321   13.64 1.15e-08 ***
Rail5   50.000      2.321   21.54 5.86e-11 ***
Rail1   54.000      2.321   23.26 2.37e-11 ***
Rail6   82.667      2.321   35.61 1.54e-13 ***
Rail3   84.667      2.321   36.47 1.16e-13 ***
Rail4   96.000      2.321   41.35 2.59e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.021 on 12 degrees of freedom
Multiple R-squared:  0.9978,    Adjusted R-squared:  0.9967 
F-statistic: 916.6 on 6 and 12 DF,  p-value: 2.971e-15

The -1 is used in the model formula to prevent the inclusion of an intercept term in the model.

The residual standard error, \(\hat{\sigma} = 4.021\), is much smaller than for the model with a single mean.

The model has successfully accounted for the rail effects.

Box plots illustrate this.

library(ggplot2)
combined2.df <- cbind(Rail, resid = residuals(fm2Rail.lm))
ggplot(combined2.df, aes(resid, Rail)) +
  geom_boxplot()

Even though this fixed-effects model accounts for the rail effects, it does not provide a useful representation of the rail data.

The main interest is the population of rails, not the specific sample studied.

In particular it does not provide an estimate of the between-rail variability.

Another drawback of a fixed model is that the number of parameters increases linearly with the number of rails “using-up” degrees of freedom.

A random-effects model for the one-way classification used in the rails experiment is written \[ y_{ij} = \beta + b_i + \epsilon_{ij}, \qquad i=1,\ldots,M, \qquad j=1,\ldots,n_i, \] where \(\beta\) is the mean travel time across the population of rails being sampled, \(b_i\) is a random variable representing the deviation from the population mean of the travel time of \(i\)th rail, and \(\epsilon_{ij}\) is a random variable representing the deviation in travel time for observation \(j\) on rail \(i\) from the mean travel time for rail \(i\).

We begin by modelling \(b_i\) and \(\epsilon_{ij}\) as independent, constant variance, normally distributed random variables with mean zero.

The variances are denoted \(\sigma_b^2\) for the \(b_i\) or “between-rail” variability, and \(\sigma^2\) for the \(\epsilon_{ij}\) or “within-rail” variability.

The parameters of the statistical model are \(\beta\), \(\sigma_b^2\), and \(\sigma^2\).

%We do not estimate the random effects \(b_i\) as such, they are %represented by a single parameter.

We can fit this random-effects model with function lme() from package ‘nlme’.

fm1Rail.lme <- lme(travel ~ 1, data = Rail, random = ~ 1 | Rail)
summary(fm1Rail.lme)
Linear mixed-effects model fit by REML
  Data: Rail 
      AIC      BIC   logLik
  128.177 130.6766 -61.0885

Random effects:
 Formula: ~1 | Rail
        (Intercept) Residual
StdDev:    24.80547 4.020779

Fixed effects:  travel ~ 1 
            Value Std.Error DF  t-value p-value
(Intercept)  66.5  10.17104 12 6.538173       0

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-1.61882658 -0.28217671  0.03569328  0.21955784  1.61437744 

Number of Observations: 18
Number of Groups: 6 

We see that the REML estimates for the parameters have been calculated as \[ \hat{\beta}=66.5, \qquad \hat{\sigma}_b=24.805, \qquad \hat{\sigma}=4.0208. \]

This is the information that the people testing the rails wanted.

We can look at the confidence intervals.

intervals(fm1Rail.lme)
Approximate 95% confidence intervals

 Fixed effects:
               lower est.    upper
(Intercept) 44.33921 66.5 88.66079

 Random Effects:
  Level: Rail 
                   lower     est.    upper
sd((Intercept)) 13.27436 24.80547 46.35335

 Within-group standard error:
   lower     est.    upper 
2.695012 4.020779 5.998737 

There is considerable lack of precision in the estimates of all three parameters.

5 An example of repeated measures and treatments

We continue with an example of the analysis of a more complex data set describing the growth of leaves during bud burst in a set of silver birch seedlings (Robson and Aphalo 2012). In an experiment in a greenhouse, seedlings were assigned at random to three groups. The three groups were exposed daily to 2.6, 5.2, and 13.0 kJ m\(^{-2}\) d\(^{-1}\) of biologically effective UV-B or about 1, 2, or 5 times the natural dose for May in Finland.

Date for the area of one leaf in each seedling was measured nine times, but not at perfectly regular intervals. Measurements started when the leaves were very small and continued until their final size was reached. Measurements of width and length of the leaves were done with a ruler and converted to area using a calibration curve constructed from other leaves of which length, width and area were measured.

We use here data from nine seedlings, three from each level of UV-B. These data are only part of the original data collected.

Quantities of interest from the perspective of the study were:

  • The final size of the leaves.
  • The rate of area expansion.
  • The length of area expansion period.
  • The variation between seedlings.

For the first three quantities we want to know if and how they were affected by the dose of UV-B radiation.

Choosing a model to fit to the data is a first step. This can be based either on theory or on the shape of the cloud of observations when plotted.

Above we considered the differences between random and fixed effects. Based on these concepts we can decide that the variation between the randomly chosen seedlings should be described as a random effect. In contrast, time in days is a covariate that being at repeatable and controlled levels we consider a fixed effect. Time is a continuous variable and, thus, not described as a factor with discrete levels.

The dose of UV-B in kJ m\(^{-2}\) d\(^{-1}\), being also at repeatable and controlled levels should also be considered a fixed effect. As with time we considered it a continuous variable. This is the best approach, as comparing pairs of doses of a continuous variable makes interpretation difficult, or even dubious.

The processes of choosing the details of a model, by fitting different models and comparing them is called model building.

We will build a model step by step. We will first fit a linear mixed-effects model using lme() from package ‘nlme’.

For this example we start by creating a grouped data object. We first make available the data as a data.frame as it is part of package ‘ymp236’.

Then we create a groupedData object that is a special kind (an object of a derived class) of data frame with additional information about the grouping of the data.

data(birch.data)
summary(birch.data)
 block       dose          irrad.time       seedling       day    
 1:27   Min.   : 2.600   Min.   :1.000   1      : 9   Min.   : 1  
 2:27   1st Qu.: 2.600   1st Qu.:1.000   4      : 9   1st Qu.: 5  
 3:27   Median : 5.200   Median :2.000   5      : 9   Median : 9  
        Mean   : 6.933   Mean   :2.667   7      : 9   Mean   :10  
        3rd Qu.:13.000   3rd Qu.:5.000   9      : 9   3rd Qu.:14  
        Max.   :13.000   Max.   :5.000   12     : 9   Max.   :24  
                                         (Other):27               
      area             larea      
 Min.   :  13.21   Min.   :2.581  
 1st Qu.: 151.44   1st Qu.:5.020  
 Median : 868.26   Median :6.766  
 Mean   :1282.08   Mean   :6.264  
 3rd Qu.:2385.69   3rd Qu.:7.777  
 Max.   :3699.45   Max.   :8.216  
                                  
birch <- 
  groupedData(area ~ day|seedling, data = birch.data)
birch <- 
  update(birch, outer = ~ irrad.time)
birch <- 
  update(birch,
         labels=list(x = "Time", y = "Area"),
         units=list(x = "(d)", y = "(mm^2)"))
plot(birch, outer = T)

birch.l <- 
  update(birch, formula = larea ~ day | seedling,
                  labels = list(y = "log area"))
plot(birch.l, outer = T)

5.1 A polynomial

Fitting individual curves

The first step in model building is to fit growth curves to the data from each seedling (from a single leaf).

Because we want to fit a linear model and the curve is S-shaped we will use a transformation: the variable larea is the natural logarithm of the leaf area in mm\(^2\).

fm1Birch.lis <-
    lmList(larea ~ (day + I(day^2)) | seedling,
           data=birch.l)
plot(intervals(fm1Birch.lis),layout=c(3,1))

The greatest variation between seedlings is in the intercept, and the least variation is in the quadratic term on day.

A linear mixed effects model

We first fit a model with only random effects for the intercept, and then we update it to include a random effect for the slope on day.

Note how we use parenthesis in the fixed part of the model to indicate the interactions that we want.

fm1Birch.lme <-
  lme(fixed = larea ~ (day + I(day^2)) * dose,
      data = birch.l, random = ~ 1 | seedling)
fm2Birch.lme <-
  update(fm1Birch.lme, random = ~ day | seedling,
         control = lmeControl(opt = "optim")) # default method does not converge
Note

In lme() and nlme() the random effects are assumed to have mean equal to zero, and for this reason the corresponding fixed effects are almost always included also in the model.

We then compare the two models with ANOVA.

We can use the fits done with the default method REML, because the fixed part of the models is the same.

They only differ in the random part of the model.

anova(fm2Birch.lme,fm1Birch.lme)
             Model df      AIC       BIC    logLik   Test  L.Ratio p-value
fm2Birch.lme     1 10 64.88481  88.05969 -22.44240                        
fm1Birch.lme     2  8 96.45865 114.99856 -40.22933 1 vs 2 35.57385  <.0001

The model with both random effects fits the data much better.

We use anova() to obtain the analysis of variance table.

anova(fm2Birch.lme)
              numDF denDF  F-value p-value
(Intercept)       1    68 47410.50  <.0001
day               1    68   578.01  <.0001
I(day^2)          1    68  1255.30  <.0001
dose              1     7     8.27  0.0238
day:dose          1    68     1.84  0.1795
I(day^2):dose     1    68     1.02  0.3158

Ploting the residuals, is an effective way of detecting important and only important lack of fulfilment of assumptions.

plot(fm2Birch.lme)

qqnorm(fm2Birch.lme, abline = c(0, 1))

We plot the predicted values for model fm1Birch.lme with only a random effect for the intercept.

plot(augPred(fm1Birch.lme, primary = ~day, level = 0:1))

We plot the predicted values for model fm2Birch.lme with random effects for the intercept and the slope on day.

plot(augPred(fm2Birch.lme, primary = ~day, level = 0:1))

Confidence intervals for the parameter estimates:

intervals(fm2Birch.lme)
Approximate 95% confidence intervals

 Fixed effects:
                      lower          est.         upper
(Intercept)    2.4600797237  3.111691e+00  3.7633015654
day            0.5242014237  5.746973e-01  0.6251931341
I(day^2)      -0.0170580456 -1.550311e-02 -0.0139481717
dose          -0.1599317692 -6.601737e-02  0.0278970252
day:dose      -0.0057113502  4.302619e-04  0.0065718740
I(day^2):dose -0.0000933486  9.577226e-05  0.0002848931

 Random Effects:
  Level: seedling 
                           lower        est.       upper
sd((Intercept))       0.28804836  0.49956529  0.86640132
sd(day)               0.01401825  0.02534174  0.04581196
cor((Intercept),day) -0.99998028 -0.99730961 -0.68916930

 Within-group standard error:
    lower      est.     upper 
0.1601861 0.1895903 0.2243920 

Did we get what we wanted?

This model fails to follow the shape of the cloud of observations, and does not match our knowlege that the size of leaves grows assymptotically until a maximum size. The polynomial used resulted in a curve with an unsatisfactory shape.

We obtained a measure of seedling to seedling variation.

We used a transformation, and in most cases this alters the interpretation of curve shapes and, crucially of interactions. Whatever approach is used, it is extremelly important when using transformations to take into account how they affect the interpretation of the results of the tests of significance and the value of the parameter estimates.

Warning

Nowadays it is only very seldom necessary to apply transformations to the data to force them to fulfil the assumptions of a statistical method. It is much better, when possible, to choose a method that does not makes the unfulfilled assumption. As transformation modify the interpretation of test results, analyses done on transformed and untransformed data, are not comparable. An obvious example is in the use of a logarithm transformation. With reference to the original data, an interaction term test whether the effects are additive. Using log transformed data the interaction term tests for additive effects in the log scale, in other words, for multiplicative effects in the original scale.

This is not to say that transformations should not be used, as for example, in some cases finding deviation from multiplicative effects may be very interesting. However, in most cases the use of data transformations should decided based on the nature of the problem under study, and how they can facilitate the interpretation of the results.

In the case of growth, the situation is not clear cut. Calculus tells us that natural logarithms have a special property in relation to derivatives. This means that a \(\log_e\) transformation of the data makes it possible to estimate the instantaneous relative growth rate: \[\mathrm{RGR} = \frac{\delta A}{A}\,\mathrm{d}\,t\] where \(A\) is leaf area. The \(\mathrm{RGR}\) is estimated as the slope of the fitted curve of \(\log_e\) on \(t\).

5.2 The logistic function

If we consider the untrasformed data, the shape of the curves suggests that a logistic function could fit the untransformed \(A\) vs. \(t\) data well. There are other functions that could be possibly used, that are either already ready available in R, or for which a function can be easily defined in our script.

We follow the same steps as above for the LME model, but we use the leaf area data in the original scale, rather than log-transformed.

We write the simple logistic model as \[ y(x) = \frac{\phi_1}{1 + \exp[(\phi_2 - x)/\phi_3]}, \] where \(\phi_1\) is the asymptotic maximum value of \(y\), \(\phi_2\) is the value of \(x\) at which the response \(y\) is \(\phi_1 / 2\), and \(\phi_3\) gives the steepness of the slope (in \(x\) units). We will call in what follows: \(\phi_1\) "Asym", \(\phi_2\) "xmid", and \(\phi_3\) "scal".

Fitting individual curves

We fit the logistic curve to the data from each seedling (one leaf) separately using nlsList(). This the same as doing separate fits for each seedling.

We use the self-starting function SSlogis(), so we do not need to provide starting values for the parameters to be used in the iteration.

fm1Birch.nlis <-
 nlsList(area ~ SSlogis(day, Asym, xmid, scal)|seedling, data = birch)
fm1Birch.nlis
Call:
  Model: area ~ SSlogis(day, Asym, xmid, scal) | seedling 
   Data: birch 

Coefficients:
       Asym      xmid     scal
1  2350.716 10.406782 2.163075
5  2865.977  9.857122 2.263556
12 2966.981 13.133205 2.159634
4  2976.083  9.800398 2.211724
18 3024.826  9.076645 1.973524
7  3113.719 10.465630 2.199875
9  3317.836 11.957336 2.326819
14 3618.106 10.767337 1.991283
15 3752.550 13.425223 2.433724

Degrees of freedom: 81 total; 54 residual
Residual standard error: 69.29162

With fixef() we get the overall fixed effects, that we will use as starting values for the non-linear mixed-effects model fit. We can then plot the 95% confidence intervals for the estimates of the different model parameters for each seedling.

plot(intervals(fm1Birch.nlis), layout=c(3, 1))

From the plot we can make an initial assessment of which random effects may have to be includes in the NLME models that we will fit next. A key thing to check is if the confidence intervals overlap. The asymptote (Asym) varies most between seedlings, and the scale parameter (scal) varies the least and the estimates do not differ significantly among seedlings (all confidence intervals overlap).

An NLME model

We fit a model with a fixed effect of dose for all three parameters of the logistic.

We include random effects for the intercept of the three parameters. For the intercepts, we use the starting values obtained from the nlsList fits. For the slopes for the dose of UV-B we use zero (no effect) as starting value.

fm1Birch.nlme <-
  nlme(area ~ SSlogis(day, Asym, xmid, scal),
       data = birch,
       fixed = Asym + xmid + scal ~ dose,
       random = Asym + xmid + scal ~ 1|seedling,
       start = c(3100, 0, 11, 0, 2.19, 0)
  )
summary(fm1Birch.nlme)
Nonlinear mixed-effects model fit by maximum likelihood
  Model: area ~ SSlogis(day, Asym, xmid, scal) 
  Data: birch 
       AIC      BIC    logLik
  1009.658 1040.786 -491.8292

Random effects:
 Formula: list(Asym ~ 1, xmid ~ 1, scal ~ 1)
 Level: seedling
 Structure: General positive-definite, Log-Cholesky parametrization
                 StdDev       Corr         
Asym.(Intercept) 377.94394036 As.(I) xm.(I)
xmid.(Intercept)   1.16794472 0.728        
scal.(Intercept)   0.08102489 0.321  0.877 
Residual          68.22934485              

Fixed effects:  Asym + xmid + scal ~ dose 
                    Value Std.Error DF   t-value p-value
Asym.(Intercept) 3231.587 246.79514 67 13.094208  0.0000
Asym.dose         -18.004  30.07572 67 -0.598626  0.5514
xmid.(Intercept)    9.690   0.76077 67 12.737492  0.0000
xmid.dose           0.186   0.09272 67  2.005940  0.0489
scal.(Intercept)    2.097   0.09707 67 21.603241  0.0000
scal.dose           0.014   0.01224 67  1.106935  0.2723
 Correlation: 
                 As.(I) Asym.d xm.(I) xmd.ds sc.(I)
Asym.dose        -0.843                            
xmid.(Intercept)  0.727 -0.613                     
xmid.dose        -0.613  0.727 -0.843              
scal.(Intercept)  0.245 -0.207  0.523 -0.442       
scal.dose        -0.200  0.239 -0.427  0.509 -0.836

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-3.29301154 -0.27180822 -0.04482507  0.35817690  2.55915058 

Number of Observations: 81
Number of Groups: 9 
plot(fm1Birch.nlme)

qqnorm(fm1Birch.nlme, abline = c(0, 1))

A second NLME model

The random effect for the scale parameter is very small compared to the estimate for the value of the parameter.

We use update() to fit a model without this random effect. We only need only an argument for random as all other arguments except start are taken from the fitted model being updated. The starting values are taken from the fitted model being updated as well.

fm2Birch.nlme <- 
  update(fm1Birch.nlme, random = Asym + xmid ~ 1 | seedling)
summary(fm2Birch.nlme)
Nonlinear mixed-effects model fit by maximum likelihood
  Model: area ~ SSlogis(day, Asym, xmid, scal) 
  Data: birch 
       AIC      BIC    logLik
  1006.788 1030.733 -493.3941

Random effects:
 Formula: list(Asym ~ 1, xmid ~ 1)
 Level: seedling
 Structure: General positive-definite, Log-Cholesky parametrization
                 StdDev     Corr  
Asym.(Intercept) 371.977391 As.(I)
xmid.(Intercept)   1.122543 0.704 
Residual          70.051164       

Fixed effects:  Asym + xmid + scal ~ dose 
                    Value Std.Error DF   t-value p-value
Asym.(Intercept) 3229.827 243.16031 67 13.282705  0.0000
Asym.dose         -17.715  29.63526 67 -0.597760  0.5520
xmid.(Intercept)    9.682   0.73223 67 13.222957  0.0000
xmid.dose           0.188   0.08925 67  2.102275  0.0393
scal.(Intercept)    2.105   0.08366 67 25.162992  0.0000
scal.dose           0.013   0.01062 67  1.252370  0.2148
 Correlation: 
                 As.(I) Asym.d xm.(I) xmd.ds sc.(I)
Asym.dose        -0.843                            
xmid.(Intercept)  0.704 -0.594                     
xmid.dose        -0.594  0.704 -0.843              
scal.(Intercept)  0.092 -0.078  0.070 -0.060       
scal.dose        -0.075  0.091 -0.058  0.072 -0.833

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-2.90659207 -0.34372525 -0.09053984  0.44161781  2.60079408 

Number of Observations: 81
Number of Groups: 9 

We use anova() to compare the two models.

anova(fm1Birch.nlme, fm2Birch.nlme)
              Model df      AIC      BIC    logLik   Test L.Ratio p-value
fm1Birch.nlme     1 13 1009.659 1040.786 -491.8292                       
fm2Birch.nlme     2 10 1006.788 1030.733 -493.3941 1 vs 2  3.1297  0.3721

The two models do not differ significantly, so we prefer the simpler model, fm2Birch.nlme, with no random effect for the scale parameter.

plot(fm2Birch.nlme)

Variance of the residuals increases with the fitted values.

qqnorm(fm2Birch.nlme, abline = c(0, 1))

The residuals do not seem to be normally distributed.

A third NLME model

To describe the heterogeneity of variance of the residuals in the model we include a power of variance function (see earlier lectures). We use the default variance covariate, the fitted values. In other words we have a model that can describe changes in variance in relation to changes in the `height’ of the fitted line.

We use update() to update our second non-linear mixed-effects model.

fm3Birch.nlme <- 
  update(fm2Birch.nlme, weights = varPower())
summary(fm3Birch.nlme)
Nonlinear mixed-effects model fit by maximum likelihood
  Model: area ~ SSlogis(day, Asym, xmid, scal) 
  Data: birch 
       AIC      BIC    logLik
  959.0612 985.4002 -468.5306

Random effects:
 Formula: list(Asym ~ 1, xmid ~ 1)
 Level: seedling
 Structure: General positive-definite, Log-Cholesky parametrization
                 StdDev      Corr  
Asym.(Intercept) 304.4439864 As.(I)
xmid.(Intercept)   0.9719637 0.644 
Residual           0.9038925       

Variance function:
 Structure: Power of variance covariate
 Formula: ~fitted(.) 
 Parameter estimates:
    power 
0.6632527 
Fixed effects:  Asym + xmid + scal ~ dose 
                    Value Std.Error DF   t-value p-value
Asym.(Intercept) 3181.819 215.12204 67 14.790764  0.0000
Asym.dose         -20.165  26.50675 67 -0.760736  0.4495
xmid.(Intercept)    9.523   0.65426 67 14.555408  0.0000
xmid.dose           0.187   0.08022 67  2.327867  0.0229
scal.(Intercept)    1.946   0.06639 67 29.314788  0.0000
scal.dose           0.019   0.00829 67  2.284112  0.0255
 Correlation: 
                 As.(I) Asym.d xm.(I) xmd.ds sc.(I)
Asym.dose        -0.842                            
xmid.(Intercept)  0.650 -0.547                     
xmid.dose        -0.550  0.654 -0.842              
scal.(Intercept)  0.192 -0.168  0.223 -0.196       
scal.dose        -0.166  0.209 -0.193  0.243 -0.835

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-2.86795252 -0.47546757 -0.02401053  0.41195644  2.63882697 

Number of Observations: 81
Number of Groups: 9 

As above we use plots to diagnose deviations from assumptions.

plot(fm3Birch.nlme)

Now the variation of the standardized residuals seems to be more homogeneous. Still a slight lack of fit is present at low fitted values.

qqnorm(fm3Birch.nlme, abline = c(0, 1))

Now it is plausible that the residuals follow a normal distribution.

We compare the last two models, fm2Birch.nlme and fm3Birch.nlme:

anova(fm2Birch.nlme, fm3Birch.nlme)
              Model df       AIC       BIC    logLik   Test  L.Ratio p-value
fm2Birch.nlme     1 10 1006.7882 1030.7327 -493.3941                        
fm3Birch.nlme     2 11  959.0612  985.4002 -468.5306 1 vs 2 49.72693  <.0001

Model fm3Birch.nlme, which includes a variance covariate that describes the heterogeneity of variance, gives a significantly better fit to the observations.

intervals(fm3Birch.nlme)
Approximate 95% confidence intervals

 Fixed effects:
                         lower         est.        upper
Asym.(Intercept)  2.768643e+03 3181.8192237 3594.9954992
Asym.dose        -7.107509e+01  -20.1646433   30.7457999
xmid.(Intercept)  8.266396e+00    9.5230052   10.7796141
xmid.dose         3.266705e-02    0.1867466    0.3408261
scal.(Intercept)  1.818743e+00    1.9462589    2.0737748
scal.dose         3.012703e-03    0.0189333    0.0348539

 Random Effects:
  Level: seedling 
                                              lower        est.       upper
sd(Asym.(Intercept))                   173.57222182 304.4439864 533.9917866
sd(xmid.(Intercept))                     0.60474502   0.9719637   1.5621682
cor(Asym.(Intercept),xmid.(Intercept))   0.03730914   0.6437893   0.9036757

 Variance function:
          lower      est.     upper
power 0.4924155 0.6632527 0.8340898

 Within-group standard error:
    lower      est.     upper 
0.3197432 0.9038925 2.5552435 

A fourth NLME model

We test the assumption that the random effects of Asym and xmid are independent.

We use a diagonal matrix pdDiag to describe the independence of the random effects, and then test the models with anova().

fm4Birch.nlme <- 
  update(fm3Birch.nlme, random = pdDiag(Asym + xmid ~ 1),
         groups = ~ seedling)
anova(fm3Birch.nlme, fm4Birch.nlme)
              Model df      AIC      BIC    logLik   Test  L.Ratio p-value
fm3Birch.nlme     1 11 959.0612 985.4002 -468.5306                        
fm4Birch.nlme     2 10 961.0087 984.9532 -470.5044 1 vs 2 3.947493  0.0469

The third model with dependence between the random effects for the two parameters gives a slightly better fit, so we choose it. We compute predictions for this model. In this call level = 0:1 indicates that we want predictions for the population and for the individual seedlings.

plot(augPred(fm3Birch.nlme, level = 0:1))

Did we get what we wanted?

Yes. We did not use a transformation, so interpretation is easy. We used a curve, the logistic that approaches asymptotically a maximum. We obtained standard deviation estimates for the variation between seedlings in the values of the parameters.

We can generate useful predictions, below, for UV-B doses of 2.6 and 13

new.birch1.data <- 
  data.frame(day = rep(1:40, 2),
             seedling = rep(NA, 80),
             dose=c(rep(2.6, 40), rep(13, 40)),
             code=c(rep(1, 40), rep(16, 40)))

new.birch1.data$pred <- 
  predict(fm3Birch.nlme,
          new.birch1.data, level = 0)

plot(pred ~ day, new.birch1.data, pch = code)

6 Conclusion

The data I used in the birch example are a small subset of data that included measurements on several leaves on each seedling and seedlings of two different birch species. The approach to the data analysis in the published paper (Robson and Aphalo 2012) was not any of those described here. Because of our interest in \(\mathrm{RGR}\) we decided to use \(\log_e A\) but instead of a polynomial a function, non-linear in its parameters, that had a more suitable shape. At the time we found it very difficult to jointly fit the NLME model based on this function to the data from all seedlings. Consequently, the NLME models were fitted individually to the data from each seedling. The parameter estimates (for each seedling) were used as input to ANOVA to test for significance of the efefcts of treatments on the parameter estimates.

References

Pinheiro, J. C., and D. M. Bates. 2000. Mixed-Effects Models in s and s-Plus. New York: Springer.
Robson, Thomas Matthew, and Pedro José Aphalo. 2012. “Species-Specific Effect of UV-B Radiation on the Temporal Pattern of Leaf Growth.” Physiologia Plantarum 144 (2): 146–60. https://doi.org/10.1111/j.1399-3054.2011.01546.x.

Reuse