Linguistics/CSE 256: Statistical Natural Language Processing

Course information

Lecture Times WF 12:45-2pm (first meeting Wed 9 Jan 12:30pm)
Lecture Location AP&M 4301
Class webpage http://grammar.ucsd.edu/courses/lign256/

Instructor information

Instructor Roger Levy (rlevy@ling.ucsd.edu)
Instructor's office AP&M 4220
Instructor's office hours WF 2-3pm

Course Description

The goal of this course is to train you, the students, to do research in natural language processing — work that can potentially be published in the leading conferences and journals of the field. In addition to helping you succeed academically in this field (and related fields including AI, machine learning, and psycholinguistics), this is also great training if you are interested in doing NLP work in industry, either in a research lab (Google, Microsoft, Powerset, Yahoo, etc.) or in a startup.

Intended Audience

Graduate students in linguistics, computer science, engineering, cognitive science, psychology, and any other discipline who are interested in how to process natural language by computer. Highly motivated undergraduates are also welcome, but please talk to the instructor before enrolling.

Reading material

We're going to be using the two premier textbooks in the field for this course:

  1. Jurafsky and Martin 2008: Speech and Language Processing (2nd Edition). Prentice Hall. We should be getting beta copies of the second edition!
  2. Manning and Schütze 1999: Foundations of Statistical Natural Language Processing. MIT Press. Here's the online version.

There will also be at least one reading from the following new textbook:

  1. Manning, Raghavan, and Schütze 2008: Introduction to Information Retrieval. Avaliable here.

We may occasionally read recent papers from the literature as well.

Skills required & suggested background

Working in natural language processing requires putting several different types of skills together:

You may not have all of these skills yet, but hopefully you have a substantial subset of them. It may require a bit of extra work for you to strengthen your background in any area where you're deficient — the focus within the class will be on how to put them together.

Mailing List

The mailing list for the class is ligncse256@ling.ucsd.edu. Sign up for the mailing list here.

Syllabus (subject to change!)

Week Day Topic Readings Materials Homework Assignments
Week 1 9 Jan Class Introduction M&S 1, 2 Lecture 1 [PDF]
11 Jan Language Modeling I M&S Chapter 6, J&M Chapter 4 Lecture 2 [PDF]
Week 2 16 Jan Language Modeling II Chen and Goodman 1998 (an absolute classic) Lecture 3 [PDF]; Kneser-Ney mini-example Programming Assignment 1 (due 1 Feb)
18 Jan Text Categorization MRS 2008, Chapter 13 Lecture 4 [PDF]
Week 3 23 Jan Word-sense Disambiguation M&S Chapter 7, J&M Chapter 20 Lecture 5 [PDF]
25 Jan Part-of-speech Tagging I M&S Chapter 9, J&M Chapter 6 Lecture 6 [PDF]; HMM Viterbi inference mini-example
Week 4 30 Jan Part-of-speech Tagging II M&S Chapter 10 Programming Assignment 2 (due 15 Feb)
1 Feb Syntax M&S Chapter 10, J&M Chapter 12 Lecture 8
Week 5 6 Feb Roger out of town: no class
8 Feb Syntactic disambiguation none — catch up! Lecture 9 [PDF]
Week 6 13 Feb Parsing I M&S Chapter 11 Lecture 10 [PDF] Final project guidelines go out
15 Feb Parsing II M&S Chapter 12 Lecture 11 [PDF] Programming Assignment 3 (due 29 Feb)
Week 7 20 Feb Frame Semantics/Semantic Roles J&M Chapter 19 Lecture 12 [PDF]
22 Feb Compositional Semantics J&M Chapter 18, handout [PDF] You should show me a draft final project proposal by this point
Week 8 27 Feb Class cancelled — Roger ill
29 Feb Discourse Processing J&M Chapter 21 Lecture 14 [PDF] Final Project Guidelines; Programming Assignment 3 (either-or!)
Week 9 5 Mar Computational Psycholinguistics Hale 2001 Lecture 15 PDF
7 Mar Unsupervised Learning I: Word segmentation, POS clustering Goldwater et al. 2006, Clark 2000 Lecture 16 [PDF]
Week 10 12 Mar Unsupervised Learning II: Syntactic acquisition Klein & Manning 2002, 2004
14 Mar No class (Roger out of town) — work on final projects!
Finals 20 Mar Final Projects Due

Requirements & grading

Your grade will be based on the following criteria:

  1. Several written homework assignments, to be distributed at various times during the class;
  2. Several programming assignments, to be distributed at various times during the class;
  3. A final project

Collaboration is encouraged for homework assignments and final projects, but you must be explicit about who you collaborated with and what the division of labor was.

Leading Conferences and Journals in the field

Computational linguistics/NLP is a very conference-oriented field; many of the classic articles in the literature never wind up getting published in journals. The top conferences include:

There are also some excellent workshops and conferences run regularly by "special interest groups", including

and others. Finally, excellent work in computational linguistics/NLP also appears in machine learning, artificial intelligence, and other conferences, notably the Conference on Neural Information Processing Systems (NIPS).

The flagship and leading journal of the field is Computational Linguistics. Other excellent journals in the field include: