5 Logistic Regression

In learning morphologically regular and irregular forms, it is argued that children show a learning pattern that is known as u-shaped learning. A typical example of this phenomenon is observed in learning past tense forms of irregular English verbs, such as ‘go’. Children seem to first use the correct past form of irregular verbs, e.g., ‘went’. At some point in development, they treat the irregular verbs as if they are regular, e.g., they say ‘goed’ (and occasionally ‘wented’) instead of ‘went’. In time, they (re)correct this, and start using the correct forms.

In this exercise we will investigate whether there is evidence for a u-shaped pattern for learning irregular past tense forms in English.

For this purpose, we counted all occurrences of correct and overregularized use of English past tense verbs in corpora from CHILDES for a large number of children in varying ages. The data has two columns and 4327 rows. The variable ‘correct’ is 1 when the instance of the past tense use is correct, 0 if overregularized. A small fragment of the data is as follows:3

Age (months)correct
56 1
56 1
56 1
33 0
33 1
33 0
10 1
14 0
14 0
14 1

Again, the problem is real, but the data is fake (only partially. The counts of past tense forms are from CHILDES, but error rate does not have any empirical basis.). You should not take the conclusions out of this analysis seriously.

You can get the full data set here.

Exercise 5.1. Fit an ordinary regression model to predict the correct responses from the variable ‘age’. In the regression dialog make sure to choose the ‘residuals vsḟitted values’ graph and a P-P plot for checking regression assumptions.
  1. Are residuals normally distributed?
  2. Do you see any pattern in the ‘residuals vs. fitted’ graph?
  3. Briefly state whether the model is adequate for the task or not.
Exercise 5.2. Fit a logistic regression model predicting the correct past tense form from age.
  1. Write down the fitted model equation.
  2. What are the predicted error rates for a 6-months-old and a 24-months-old?
  3. Was the maximum likelihood estimation successful?
Exercise 5.3. To test our initial question, whether there is evidence for a u-shaped learning curve or not, we will fit another model. Our strategy is as follows:

If there is a u-shaped learning curve, we would expect a change in the error rate similar to the right panel of the graph below. On the other hand, if there was a linear learning trend, one would expect a linear error reduction as in the left panel of the graph.

pict

The graph on the left corresponds to a quadratic relationship, which can be represented by a model predicting rate of correct forms from square of age (age2). The graph on the right corresponds to a linear model (one we fit in the previous exercise using age as predictor). Our intuition is that if the learning follows a u-shaped pattern, the quadratic (predictor age2) fits the data better.

  1. Fit a logistic regression model that predicts the correct responses from square of ‘age’. Write down the logistic regression equation for this model.
  2. Which model fits to the data better? Which part of the SPSS output allows you do make this judgment.
  3. Which model would you prefer? Explain your answer briefly.