Unsupervised Learning in Computational Linguistics

Unsupervised machine learning is a collection of methods for inferring (hidden) structure from `unlabeled' data. Considering the labor-intensive and time-consuming nature of creating labeled data and the abundance of unlabeled data, it is clear that unsupervised methods are attractive in many fields, including in computational linguistics (CL) and natural language processing (NLP). Besides these practical motivations, unsupervised learning is also instrumental in investigating many problems of linguistics and cognitive sciences.

In this course we will study unsupervised methods for solving some of the typical NLP tasks such as tokenization, part-of-speech tagging, morphological analysis and parsing. We will also review some of the research-oriented applications of unsupervised methods in linguistics. For example, their use in modeling human language processing and acquisition, and investigating linguistic variation.

The course will take a practical approach. As well as reading and discussing some important and/or recent research, we will build practical models/applications during the course.

Course material

