5 Repeated measures

The analysis methods we have studied so far assume that the observations are independent. This assumption is often wrong, and it is intentionally violated in some experimental designs to increase the sensitivity of the tests. In this section we will exercise with a well known procedure repeated-measures ANOVA for analyzing (experimental) data where same subjects are measured more than once (hence, observations are not independent).

We will use two (again hypothetical) new data sets. In the first data set we assume that we investigate whether newborns distinguish their mother’s native language from another language. We recruit 30 newborns, and when we find them awake during their first day in life, we let them listen to two comparable short stories, one in their mother’s language and another in a foreign language. While they are listening to these stories, they are equipped with a pacifier through which we can measure their sucking rate. Crucially, each infant is tested on both conditions (the order of languages are randomized). Since our hypothetical newborns never fall asleep, start crying or spit out the pacifier in the middle of the story, we have 60 measurements from 30 infants. You can load the data set using,

> print(load( 
  url(http://coltekin.net/cagri/R/data/newborn.rda) 
  ))

Notice that this time the file extension is different, and we used load() function. This data file is in R’s native binary format. The advantage is that you retain all the information in the original data (including the variable name). The disadvantage is in its portability. The CSV files we used earlier can be read virtually in any environment by many applications, while the .rda files can only be read by R (but you do not need to worry about the platform or the operating system incompatibilities). The reason we wrapped the load() within a print is because of the fact that load() silently loads the variables in the data file. If you print() the return value, you will know which variables are loaded. Otherwise, you need to inspect your R environment to figure out the changes introduced by load(). If all went fine, you should have a new data frame named newborn. Note again that the data is fake, but the method, high-amplitude sucking paradigm described here without many details is a well-known technique for studying infants. For a real study of the sort described here see Nazzi et al. [1998].

The second data set we will use comes from another hypothetical language acquisition study. This time we are interested in children who are raised bilingually, where one of the languages they speak is ‘home only’ and the other language is also used in their school. The data set can be found at http://coltekin.net/cagri/R/data/bilingual.txt as a ‘tab-separated file’.

Exercise 5.1.  Read the tab-separated file http://coltekin.net/cagri/R/data/bilingual.txt into a new data frame with name bilingual (we will refer to this data set with this name throughout this section, but you can use a shorter name if you prefer).

You can use read.delim() for reading this file. Note that generic function for reading ‘tabular files’ is read.table().

Exercise 5.2. Inspect the data frame bilingual, and make sure that all variables in the data frame have correct data type. Particularly, we would like to make sure that all categorical variables are identified as factors.

Optionally you can repeat Exercise 5.1, but supply the read.delim() (or read.table()) command with a colClasses option (see help on these functions for detailed use of this option).

5.1 Paired t-test

As we did with independent-measures ANOVA, we will start with the simplest case: when we have only two conditions. If we only have two conditions measured repeatedly over a sample of individuals (people, animals, countries, hospitals, …) the we typically use a paired t test. Our data set newborns provides a textbook case application of paired t test. But for the sake of comparison, we will first take a detour, and revisit the independent samples t test.

Exercise 5.3.  Assuming the newborn data came from two independent groups of babies (no baby is tested twice), test whether the babies respond differently to their native language and the foreign language.

Check whether t-test assumptions (except independence) are violated or not.

In R, The paired t test can be performed using the same function, t.test(). All you need to do is pass the argument paired=TRUE.

Exercise 5.4.  Perform a paired t test. Compare your results and to the independent-samples t test you have just performed.

Exercise 5.5. Check whether the normality assumption of paired t tests holds for the newborn data with a normal Q-Q plot. What quantity has to be distributed normally?

Do you also need to check whether variances across the groups are approximately equal?

Exercise 5.6. Plot side-by-side box plots of sucking rates for the native and the foreign language. Is the difference we are interested in clearly observable in the box plots?

5.2 Repeated-measures ANOVA

Repeated measures ANOVA can be performed in R using a few different ways. In this tutorial, we will exercise with the function aov() that comes with the base R installation (‘stats’ package). aov() can handle only standard cases—no violation of the assumptions, no missing data— and only displays minimal information—no effect sizes. For more complex designs, one can use utilities found in additional packages or libraries, such as Anova() (note the capital ‘A’) from the car package and ezANOVA() (read ‘easy ANOVA’) from the package ez. We will conclude the section with an example run on ezANOVA(). However, we note that if your design is not ideal for repeated measures ANOVA you should probably use the ‘multi-level’ or ‘mixed-effect’ linear regression that we will see later in this tutorial.

As before, we will start with the simplest case. Remember that when we have two groups, the independent-measures ANOVA is equivalent to two-samples independent measures t test. Similarly, when we have only two groups, the repeated-measures ANOVA gives you the same results as the paired t test. Here is how we do a repeated-measures ANOVA using aov() on the data set newborn.

 
> m <- aov(rate ~ language + Error(participant/language), 
           data=newborn) 
> summary(m) 
Error: participant 
          Df Sum Sq Mean Sq F value Pr(>F) 
Residuals 29   5792   199.7 
 
Error: participant:language 
          Df Sum Sq Mean Sq F value   Pr(>F) 
language   1  306.9  306.95   28.24 1.06e-05 ⋆⋆⋆ 
Residuals 29  315.2   10.87  

Not surprisingly, the p-value matches to the p-value found in Exercise 5.4. As in earlier ANOVA models, we specified a model predicting the rate from the language using the formula notation. Crucial difference is in the specification of the Error() term. This term specifies the replication in the experiment design. In this case, we tell aov() that the response corresponding to every level of language is measured for each participant. The specification of the error term could be confusing at first sight. Within the Error() term, the part before the slash ‘/’ specifies the ‘case’ or ‘subject’ variable. The part after the slash specifies the ‘within subject’ variable(s). Note that we used summary() instead of summary.aov(), since the default summary method for an aov() object is ANOVA-like summary.

Without the Error() term, the function aov() is equivalent to the lm() function we used for independent measures ANOVA.

Exercise 5.7. Perform an independent measures ANOVA using aov() on the newborn data set. Compare your results with the repeated measures ANOVA results presented in the listing above.

Exercise 5.8.  Perform a repeated measures ANOVA that tests the effect of age on mlu in the bilingual data set.

Extending the repeated measures ANOVA in Exercise 5.8 to include more predictors is easy. We just add all predictors to the formula notation as we do for the factorial ANOVA. You only need to be careful to include all the ‘within-subject’ variables in the error term. For example, for two-way within-subject factorial ANOVA with interaction term where age and language are the predictors, the error term becomes Error(subj/(languageage)).

Exercise 5.9. Perform a repeated measures ANOVA that tests the effect of age and language on mlu in the bilingual data set. Also include the interaction term in your analysis.

Exercise 5.10. Use interaction.plot() to visualize the interaction between the variables age and language in the bilingual data.

Often, we want to include predictors that are not or cannot be replicated in a repeated measures design. Such a variable in our bilingual data set is gender, which is a good example of a variable that can hardly be measured within-subjects. In this cases we use so-called mixed-design ANOVA analysis. There is nothing interesting specifying a mixed-design ANOVA in R. We just add the between-subject variable(s) to the model formula, but exclude it from the error term.

Exercise 5.11.  Perform a mixed ANOVA with age and language as within-subject predictors, and gender as a between-subjects predictor.

The above exercises exemplify a variety ANOVA designs that can be fit using aov(). However, aov()

In such cases a few packages in R provide solutions that are similar to other statistical software. We will only present an example using ezANOVA() from the package ez. However, when things are not perfectly balanced, and neat, the repeated-measures ANOVA becomes difficult to interpret. One reasonable course of action when ANOVA design becomes too complicated, or assumptions are violated at some level is to switch to so-called mixed-effect models which also offer some other benefits. We will discuss mixed-effect linear models later in this tutorial.

Listing 8 repeats Exercise 5.11 using ezANOVA(). The listing is slightly edited for clarity (you still need to exercise your skills in reading scientific notation).


Listing 8: An example with ezANOVA().
1> library(ez) 
2> ezANOVA(data=bilingual, 
3          dv=mlu, 
4          wid=.(subj), 
5          within=.(language, age), 
6          between=sex) 
7$ANOVA 
8        Effect DFn DFd        F        p      ges 
92          sex   1  18 2.92e-04 9.87e-01 1.02e-05 
103         lang   1  18 6.78e+00 1.80e-02 3.38e-02 
115          age   2  36 1.74e+01 5.07e-06 1.47e-01 
124     sex:lang   1  18 1.98e-01 6.62e-01 1.02e-03 
136      sex:age   2  36 6.48e-01 5.29e-01 6.36e-03 
147     lang:age   2  36 3.07e+00 5.87e-02 1.66e-02 
158 sex:lang:age   2  36 1.42e+00 2.54e-01 7.76e-03 
16 
17$Mauchlys Test for Sphericity 
18               Effect         W         p 
195                 age 0.9937147 0.9478173 
206          gender:age 0.9937147 0.9478173 
217        language:age 0.9905139 0.9221786 
228 gender:language:age 0.9905139 0.9221786 
23 
24$Sphericity Corrections 
25               Effect  GGe    p[GG]  HFe    p[HF] 
265                 age 0.99 5.38e-06 1.12 5.07e-06 
276          gender:age 0.99 5.28e-01 1.12 5.29e-01 
287        language:age 0.99 5.93e-02 1.11 5.87e-02 
298 gender:language:age 0.99 2.54e-01 1.11 2.54e-01

The first thing to note in Listing 8 is the command library() on line 1. This includes the functions defined in library ez. There are a large number of (mostly free/open-source) libraries for R for various (statistical) tasks. In this tutorial, we try to stick to the bare-bones, but you should check existing libraries for the tasks that are not possible or difficult to do with the basic packages of R. The central repository for all R packages is at http://cran.r-_project.org/. If the package you need is not installed on your computer, you can install any package from CRAN with the command install.packages().

Returning to the ANOVA results Listing 8, the main findings should be the same as what you found with aov() in Exercise 5.11. The additional information presented here include the effect size on the column labeled with ges (generalized η2); the results of Mauchly’s Test; and in case we could not maintain the sphericity assumption two common corrections used in the literature: Greenhouse-Geisser 𝜖 (GGe), and Huynh-Feldt 𝜖 (HFe) and their corresponding p-values. Furthermore, ezANOVA() includes options for cases when the data is not balanced (so called type I, type II and type III sums of squares—aov() only calculates type I sums of squares.). However, you should better read and understand these before using it in a real analysis.

We will stop the discussion of repeated measures ANOVA here, but we will revisit some of the concepts and exercises when we discuss the mixed-effect models.