Language Learning

Introduction

The seminar will survey work on computational models of various aspects of language learning. We will relate these lines of work to linguistic and psychological issues such as the dispute concerning the poverty of the stimulus argument (Pullum and Scholz), connectionists' views (Elman et al. "Rethinking Innateness"), and work on children's sensitivity to distributional tendencies (work by Newport, Mintz, and others).

Purpose: To explore the increasingly important literature on computational models of human language learning. The focus is on simulations of human language acquisition, where the aim is not simply accuracy, but rather similarity to human accuracy (including error), the order of acquisition, etc. Especially interesting is work which aspires to neurological plausibility. The present course aims to explore this work by reading and discussing published papers. Our primary purpose therefore does not include an examination of applications of machine learning in natural language processing, i.e., studies whose primary aim is to improve, e.g., parsing accuracy.

Example: Tim Dorscheidt, Nicola Valchev and Terence van Zoelen Minimal Generalization of Dutch Diminutives, project report on the 2006-2007 course.

Prerequisites The course assumes familiarity with basic concepts of machine learning, but there will be time to review occasionally unfamiliar concepts as well. The course is aimed at students in research masters' programs, which assumes serious motivation and scientific maturity.

Note This course is referred to as Computermodellen voor taalverwerving in the Dutch course catalogue.

Docent	prof.dr.ir. J. Nerbonne
Literature	See literature section below
Organization	Seminars with student presentations as well as discussion and (perhaps) presentations by guest researchers. All students are expected to read each paper and participate in discussions, as well as present at least one paper. Instructor will make early presentations, and lead discussions for these sessions, but will not lecture after the first meetings.
Time	Fall, 2008
Place	Tues. 15:30-17, A-weg 30, Rm 104
Credit	Credit based on: leading discussion on (1-2) papers; contributions to discussions; an implementation of a psycholinguistic hypothesis with report of results.
Level	Research Master's
Information	J.Nerbonne at RuG dot nl

Program

Date	Subject	Reading	Presentation
Nov 11	Introduction	None	J. Nerbonne [slides]
Nov 18	Categorial Grammars	None	C. Coltekin [slides]
Nov 25	Poverty of Stimulus	Pullum & Scholtz (2002)	J. Ma [slides]
Nov 25	Learning English Auxiliary Inversion	Clark (2006)	S. Duarte
Dec 2	Machine Learning & Universal Grammar	Lappin & Shieber (2007)	X. Yao
Dec 2	Project Discussion	None
Dec 9	Project Discussion	None
Jan 6
Jan 13
Jan 20

Literature

The following is an extended reading list for the course. We will be reading a selection from these papers. The same list organized by subject is also available as a pdf file.

Albright, A. and Hayes, B. (2002). Modeling English past tense intuitions with minimal generalization. In SIGPHON 6: Proceedings of the Sixth Meeting of the ACL Special Interest Group in Computational Phonology, pages 58-–69. [ bib | .pdf ]

Berwick, R. C. and Pilato, S. F. (1987). Learning syntax by automata induction. Machine Learning, 2(1):9-38. [ bib | .pdf ]

Brent, M. R. and Cartwright, T. A. (1996). Distributional regularity and phonotactic constraints are useful for segmentation. Cognition, 61:93-125. [ bib | .pdf ]

Cartwright, T. A. and Brent, M. R. (1994). Segmenting speech without a lexicon: The roles of phonotactics and speech source. [ bib | .pdf ]

Chater, N. and Vitányi, P. (2007). `Ideal learning' of natural language: Positive results about learning from positive evidence. Journal of Mathematical Psychology, 51:135-163. [ bib | .pdf ]

Clark, A. and Eyraud, R. (2006). Learning auxiliary fronting with grammatical inference. In Proceedings of CoNLL, pages 125-132, New York. [ bib | .pdf ]

Clark, A., Eyraud, R., and Habrard, A. (2008). A polynomial algorithm for the inference of context free languages. In Proceedings of International Colloquium on Grammatical Inference. Springer. [ bib | .pdf ]

Crystal, D. (1997). The Cambridge Encyclopedia of Language. Cambridge University Press. [ bib ]

Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14:179-211. [ bib | .pdf ]

Gold, E. M. (1967). Language identification in the limit. Information and Control, 10(5):447-474. [ bib | .pdf ]

Goldsmith, J. (2001). Unsupervised learning of the morphology of a natural language. Computational Linguistics, 27(2):153-198. [ bib | .pdf ]

Hauser, M. D., Chomsky, N., and Fitch, W. T. (2002). The faculty of language: what is it, who has it, and how did it evolve? Science, 298(5598):1569-1579. [ bib | .pdf ]

Klein, D. and Manning, C. (2002). A generative constituent-context model for improved grammar induction. In Proceedings of the Association for Computational Linguistics (ACL). [ bib | .pdf ]

Klein, D. and Manning, C. (2004). Corpus-based induction of syntactic structure: Models of dependency and constituency. In Proceedings of the Association for Computational Linguistics (ACL). [ bib | .pdf ]

Lappin, S. and Shieber, S. M. (2007). Machine learning theory and practice as a source of insight into universal grammar. Journal of Linguistics, 43(2):393-427. [ bib | .pdf ]

Osborne, M. and Briscoe, T. (1997). Learning stochastic categorial grammars. In CoNLL97, pages 80-87. ACL. [ bib | .pdf ]

Pullum, G. K. and Scholz, B. C. (2002). Empirical assessment of stimulus poverty arguments. The Linguistic Review, 19:9-50. [ bib | .pdf ]

Saffran, J. R., Aslin, R. N., and Newport, E. L. (1996). Statistical learning by 8-month old infants. Science, 274(5294):1926-1928. [ bib ]

Siskind, J. M. (1996). A computational study of cross-situational techniques for learning word-to-meaning mappings. Cognition, 61(1-2):1-38. [ bib | .pdf ]

Solan, Z., Horn, D., Ruppin, E., and Edelman, S. (2005). Unsupervised learning of natural languages. Proceedings of National Academy of Sciences, 102:11629-11634. [ bib | .pdf ]

Thompson, S. P. and Newport, E. L. (2007). Statistical learning of syntax: The role of transitional probability. Language Learning and Development, (3):1-42. [ bib ]

Tomasello, M. (2006). Acquiring linguistic constructions. In Kuhn, D. and Siegler, R. S., editors, Handbook of Child Psychology, pages 255-298. Wiley, New York. [ bib ]

Valiant, L. G. (1984). A theory of the learnable. Communications of ACM, 27(11):1134-1142. [ bib | .pdf ]

Xu, F. and Tenenbaum, J. B. (2007). Word learning as Bayesian inference. Psychological Review, 117(2):245-272. [ bib | .pdf ]

Yang, C. D. A formal theory of language development. [ bib | .pdf ]

Yang, C. D. (1999). A selectionist theory of language development. In Proceedings of 37th Meeting of the Association for Computational Linguistics, pages 429-435. Association for Computational Linguistics. [ bib | .pdf ]

Zettlemoyer, L. S. and Collins, M. (2005). Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In Proceedings of the Twenty First Conference on Uncertainty in Artificial Intelligence (UAI-05). [ bib | .pdf ]

Project

This year's project is on learning Categorial Grammars. The aim of the project is to develop an unsupervised Categorial Grammar learner. The details and scope of the project will be defined during the class discussions.