In learning morphologically regular and irregular forms, it is argued that children show a learning pattern that is known as u-shaped learning. A typical example of this phenomenon is observed in learning past tense forms of irregular English verbs, such as ‘go’. Children seem to first use the correct past form of irregular verbs, e.g., ‘went’. At some point in development, they treat the irregular verbs as if they are regular, e.g., they say ‘goed’ (and occasionally ‘wented’) instead of ‘went’. In time, they (re)correct this, and start using the correct forms.
In this exercise we will investigate whether there is evidence for a u-shaped pattern for learning irregular past tense forms in English.
For this purpose, we counted all occurrences of correct and overregularized use of English past tense verbs in corpora from CHILDES for a large number of children in varying ages. The data has two columns and 4327 rows. The variable ‘correct’ is 1 when the instance of the past tense use is correct, 0 if overregularized. A small fragment of the data is as follows:3
Age (months) | correct |
56 | 1 |
56 | 1 |
56 | 1 |
33 | 0 |
33 | 1 |
33 | 0 |
10 | 1 |
14 | 0 |
14 | 0 |
14 | 1 |
⋮ | ⋮ |
Again, the problem is real, but the data is fake (partially: the counts of past tense forms are from CHILDES, but error rate does not have any empirical basis.). You should not take the conclusions out of this analysis seriously.
You can get the full data set here.
If there is a u-shaped learning curve, we would expect a change in the error rate similar to the right panel of the graph below. On the other hand, if there was a linear learning trend, one would expect a linear error reduction as in the left panel of the graph.
The graph on the left corresponds to a quadratic relationship, which can be represented by a model predicting rate of correct forms from square of age (age2). The graph on the right corresponds to a linear model (one we fit in the previous exercise using age as predictor). Our intuition is that if the learning follows a u-shaped pattern, the quadratic (predictor age2) fits the data better.