Unsupervised Learning in Computational Linguistics

Unsupervised machine learning is a collection of methods for inferring (hidden) structure from `unlabeled' data. Considering the labor-intensive and time-consuming nature of creating labeled data and the abundance of unlabeled data, it is clear that unsupervised methods are attractive in many fields, including in computational linguistics (CL) and natural language processing (NLP). Besides these practical motivations, unsupervised learning is also instrumental in investigating many problems of linguistics and cognitive sciences.

In this course we will study unsupervised methods for solving some of the typical NLP tasks such as tokenization, part-of-speech tagging, morphological analysis and parsing. We will also review some of the research-oriented applications of unsupervised methods in linguistics. For example, their use in modeling human language processing and acquisition, and investigating linguistic variation.

The course will take a practical approach. As well as reading and discussing some important and/or recent research, we will build practical models/applications during the course.

Please see the course syllabus for information on requirements and evaluation.

During this class we will try out GitHub classroom. Please read the relevant information on GitHub after becoming a member of the class.

Course material

Introductory handout for the first course session. The rest of the material and the course schedule can be found on course information on GitHub.


  • Instructor: Çağrı Çöltekin <ccoltekin@sfs.uni-tuebingen.de>, Willemstr. 19, room 1.09
    Office hours: Monday 12:00 - 14:00