A hands-on tutorial on using R for (mostly) linguistics research


01-12-2015, 23:00

Note: This version of the exercises are under a reorganization/rewrite process. The older but more complete version can be found found at http://coltekin.net/cagri/R.old/.

This is a hands-on tutorial on R, a powerful statistical analysis software. The tutorial is prepared for the course Seminar in Methodology and Statistics taught by John Nerbonne at the University of Groningen.

The aim of the exercises is to provide a hands-on tutorial on some statistical analysis procedures that are common in various branches of linguistics. This tutorial assumes that you are familiar with basic statistical concepts. However, no initial knowledge of R is assumed.

Any suggestions and/or corrections are welcome.

HTML version of this tutorial makes use of MathML. Too see the mathematical formulas correctly you should use a MathML capable browser. Recent versions of Firefox works out of the box, for other browsers you may need additional plugins. Alternatively, you can download and use the PDF version of the complete exercise set. The PDF version is also useful if you prefer to have a printed version of the exercises.

1 Starting R and finding your way around
 1.1 Getting help
 1.2 Doing simple calculations with R
 1.3 Variables
 1.4 Vectors in R
2 Basic data exploration and inference
 2.1 Summarizing and visualizing one-dimensional data
 2.2 Summarizing and visualizing two-dimensional data
 2.3 Simple inference
3 Linear regression: a first introduction
 3.1 Some preliminaries
 3.2 Some model diagnostics
4 Linear models with categorical predictors
 4.1 Comparing two means
 4.2 Checking assumptions of t test and ANOVA
 4.3 Single ANOVA
 4.4 Factorial ANOVA
 4.5 If ANOVA assumptions are not met
 4.6 T-test as a linear model
5 Repeated measures
 5.1 Paired t-test
 5.2 Repeated-measures ANOVA
6 Graphics
 6.1 Basic graphics
 6.2 Labels, axes, legends …
 6.3 More than one graph on the same canvas
 6.4 Writing your graphs to external files
 6.5 Additional exercises
7 Regression again
 7.1 Correlation
 7.2 Least-squares linear regression
 7.3 Model diagnostics
 7.4 An example transformation
 7.5 Predictions of a linear model
8 Multiple regression
 8.1 Revisiting single regression (for the last time)
 8.2 Multiple regression
 8.3 Multicollinearity
 8.4 r2 and adjusted r2
 8.5 Model selection
9 General Linear Models
 9.1 Categorical variables in regression
 9.2 Other ways of coding categorical variables: contrasts
 9.3 Mixing categorical and numeric predictors
10 Probability distributions
11 Logistic Regression
 11.1 Regression and binomial response variables
 11.2 Binomial data and generalized linear models
 11.3 Binary data
 11.4 Further exercises in model selection
 11.5 More on logistic regression and GLMs
12 Multilevel / mixed-effect models
 12.1 Background
 12.2 Random intercepts
 12.3 Random slopes
 12.4 Random intercepts or random slopes
 12.5 Where are my p-values?
 12.6 Multiple fixed and random effects
 12.7 Crossed random effects
 12.8 Where to go from here?
A Answers
 A.1 Starting R and finding your way around
 A.2 Basic data exploration and inference
 A.3 Linear regression: a first introduction
 A.4 Linear models with categorical predictors
 A.5 Repeated measures
 A.6 Graphics
 A.7 Regression again
 A.8 Multiple Regression
 A.9 Probability distributions
 A.10 Logistic Regression
 A.11 Multilevel / mixed-effect models
B Model formulas