Statistical natural language processing

This course is an introduction to basic methods and applications in (statistical) natural language processing. The course introduces a wide range of topics in natural language processing, along with the related techniques from machine learning and related fields.

This page will contain up-to-date information on course schedule and material. Please also subscribe and follow the Moodle page of the course.

The evaluation will be based on three assignments and a final exam. The course is worth 9 ECTS credits.

Announcements

  • 2017-07-21: Assignment 3 is available.
  • 2017-07-10: Assignment 2 is available.
  • 2017-06-02: Assignment 1 is available. Deadline: June 30, 12:00.
  • 2017-05-12: Example solutions of the exercises can be found here
  • 2017-04-19: website is up.

Reading material

  • Daniel Jurafsky and James H. Martin (2009) Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Pearson Prentice Hall, second edition (JM)
    chapters from 3rd edition draft (JM3)
  • Trevor Hastie, Robert Tibshirani, and Jerome Friedman (2009), The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, second edition. (HTF)
    available online

Course outline (tentative!)

Week Monday Wednesday Friday
01 Apr 17
No class
Apr 19
Introduction / organization
Reading: JM Ch. 1
slides, handout (8up)
Apr 21
Python tutorial (1)
exercises
02 Apr 24
Mathematical preliminaries
slides, handout
Apr 26
Probability theory
slides, handout
Apr 28
Python tutorial (2)
03 May 01
No class
May 03
Information theory
slides, handout
May 05
exercises
04 May 08
Statistical models
Reading: HTF Ch.1
slides, handout
May 10
N-gram language models (1)
Reading: JM Ch.4
slides, handout
May 12
exercises, data
05 May 15
Machine learning intro (1)
Reading: HTF Ch.1 & 3.2 & 3.4
slides, handout
May 17
N-gram language models (2)
May 19
exercises
06 May 22
exercises (cont.)
May 24
Machine learning intro (2)
Reading: JM 6.6 (JM3 Ch.7), HTF 4.4
slides, handout
May 26
N-gram language models (3)
07 May 29
Tokenization, normalization, segmentation
slides, handout
May 31
Machine learning intro (3)
slides, handout
Jun 02
assignment 1, data
Jun 05 - Jun 09: no class
08 Jun 12
POS tagging
Reading JM Ch.5 (JM3: Ch.10)
slides, handout
Jun 14
Sequence learning
Reading JM Ch.6 (JM3: Ch.9)
slides, handout
Jun 16
exercises, data
09 Jun 19
Neural networks (1)
slides, handout
Jun 21
Neural networks (2)
Jun 23
exercises (cont.)
10 Jun 26
Parsing: introduction
Reading: JM Ch.13 (JM3 Ch.12)
slides, handout
Jun 28
Statistical constituency parsing
Reading: JM Ch.14 (JM3 Ch.13)
slides, handout
Jun 30
exercises (cont.)
11 Jul 03
Statistical dependency parsing
Reading: JM3 Ch.14
slides, handout
Jul 05
Unsupervised learning
slides, handout
Jul 07
Exercises
12 Jul 10
Distributed representations
Reading: JM3 Ch.15&16
slides, handout
Jul 12
Distributed representations (cont.)
Jul 14
Exercises
13 Jul 17
Text classification
slides, handout
Jul 19
Summary
Jul 21
Exercises
14 Jul 24
Summary
Jul 26
Exam
Jul 28
Exam discussion & exercises, data

Contact

  • Instructor: Çağrı Çöltekin <ccoltekin@sfs.uni-tuebingen.de>, Willemstr. 19, room 1.09
    Office hours: Wednesday 10:00 - 12:00
  • Tutor: Kuan Yu <kuan.yu@student.uni-tuebingen.de>