Linguistics 274: Computational Psycholinguistics (Winter 2010)

Instructor info

Instructor Roger Levy (rlevy@ling.ucsd.edu)
Office Applied Physics & Math (AP&M) 4220
Office hours W 2-3:50pm
Class Time MW 12:00-1:50pm (in general, Wednesdays 1-1:50pm will be practicum times)
Class Location AP&M 4218
Class webpage http://grammar.ucsd.edu/courses/lign274/

Course Description

This course is about computational approaches to problems in psycholinguistics, focusing on probabilistic approaches to language knowledge, acquisition, and use. Today, research in this area requires skill with probability and statistics, familiarity with formalisms from computational linguistics, ability to use and develop new computational tools, and comfort with handling complex datasets. This course will involve hands-on skill building, covering several important topics in this area. We'll start out with maximum-entropy models and hierarchical regression models, then move on to latent-variable models including Latent Dirichlet Allocation ("topics" models), the Dirichlet Process, and on to weighted grammar formalisms including probabilistic finite-state automata and probabilistic context-free grammars. We'll apply these techniques for both data analysis and modeling on a variety of problems and datasets. Both maximum-likelihood and Bayesian approaches will be covered.

We'll be using a variety of computational tools including the open-source R programming language, the Bayesian graphical-modeling toolkit JAGS, packages for R that implement Latent Dirichlet Allocation and other hierarchical models, OpenFST for weighted finite-state machines, and my implementations of an incremental parser for probabilistic context-free grammars, as well as general weighted finite-state automaton/context-free grammar intersection. Comfort with programming in a high-level language such as Python may also come in useful during the course.

Target audience

Interested students should have interest and some background in studying language using quantitative and modeling techniques. You should also have some background in probability theory and/or statistics, and you should know how to program. Anyone who has taken my course Linguistics 251 (Probabilistic Methods in Linguistics) fulfills all these background prerequisites; if you haven't taken Linguistics 251 but are interested in taking the course, just talk to me.

Reading material

The main reading material will be draft chapters of a textbook-in-progress, Probabilistic Models in the Study of Language, that I am writing. These draft chapters can be found here. There are also a number of other reference texts that may be of use in the course, including:

Finally, we may supplement these with additional readings, both from statistics texts and pertinent linguistics articles.

Syllabus

Week Day Topic & Reading Textbook chapters Other reading Homework Assignments
Week 1 4 Jan Brief review of probability theory & statistics PMSL Chapters 2-5
6 Jan Roger out of town for LSA Annual Meeting, no class
Week 2 11 Jan Complete review of probability & statistics PMSL Chapter 5
13 Jan Maximum Entropy models I PMSL Chapter 6 Berger et al., 1996
Week 3 18 Jan Martin Luther King Day, no class
20 Jan Maximum Entropy models II PMSL Chapter 6 Hayes & Wilson, 2008 Homework 1
Week 4 25 Jan Hierarchical regression models I PMSL Chapter 8 Baayen et al., 2008
27 Jan Hierarchical regression models II PMSL Chapter 8
Week 5 1 Feb Hierarchical regression models III PMSL Chapter 8
3 Feb Latent-variable models I: mixtures of Gaussians PMSL Chapter 9 Vallabha et al., 2007
Week 6 8 Feb Latent-variable models II: latent Dirichlet allocation PMSL Chapter 9 Blei et al., 2003, Griffiths & Steyvers, 2004 Homework 2
10 Feb Latent-variable models III PMSL Chapter 9 Final project guidelines
Week 7 15 Feb President's day, no class
17 Feb Nonparametric models I: Dirichlet Process PMSL Chapter 10
Week 8 22 Feb Nonparametric models II: Dirichlet Process, cont'd PMSL Chapter 10 Teh et al., 2006
24 Feb Nonparametric models III: Hierarchical Dirichlet Processes PMSL Chapter 10 Goldwater et al., 2009
Week 9 1 Mar Probabilistic grammar formalisms I: Probabilistic Finite-State Machines PMSL Chapter 11
3 Mar Probabilistic grammar formalisms II: Probabilistic Finite-State Machines cont'd PMSL Chapter 11
Week 10 8 Mar Probabilistic grammar formalisms III: Probabilistic context-free grammars PMSL Chapter 11 Charniak, 1997
10 Mar Probabilistic grammar formalisms IV: applications PMSL Chapter 9; Probabilistic Earley Algorithm slides Levy, 2008
Finals 19 Mar Final projects due!

Requirements

If you are taking the course for credit, there are four things expected of you:

1. Regular attendance in class.

2. Doing the assigned readings and coming ready to discuss them in class.

3. Doing several homework assignments to be assigned throughout the quarter. Email submission of the homework assignments is encouraged, but please send it to lign274-homework@ling.ucsd.edu instead of to me directly. If you send it to me directly I may lose track of it.

You can find some guidelines on writing good homework assignments here. The source file to this PDF is here.

4. A final project which will involve computational modeling and/or data analysis in some area relevant to the course.

Mailing List

There is a mailing list for this class, lign274@ling.ucsd.edu. Please subscribe to the mailing list by filling out the form at http://pidgin.ucsd.edu/mailman/listinfo/lign274! We'll use it to communicate with each other.

Programming help

For this class I'll be maintaining an FAQ. Read the FAQ here.

I also run the R-lang mailing list. I suggest that you subscribe to it; it's a low-traffic list and is a good clearinghouse for technical and conceptual issues that arise in the statistical analysis of language data.

In addition, the searchable R mailing lists are likely to be useful.