# Probabilistic Models in the Study of Language (ESSLLI 2012 Introductory Course in Language and Computation)

## 1 Course information

Lecture Dates | August 6 through August 10, daily |

Lecture Times | 9:00am-10:30am |

Lecture Location | Jan II Dobry |

Class webpage | http://ling.ucsd.edu/~rlevy/teaching/esslli2012/ |

## 2 Instructor information

Instructor | Roger Levy (rlevy@ucsd.edu) |

Instructor Title | Associate Professor, Department of Linguistics, University of California at San Diego |

## 3 Course Description

Probabilistic models have thoroughly reshaped computational linguistics and continues to profoundly change other areas in the scientific study of language, ranging from psycholinguistics to syntax and phonology and even pragmatics and sociolinguistics. This change has included (a) qualitative improvements in our ability to analyze complex linguistic datasets and (b) new conceptualizations of language knowledge, acquisition, and use. For the most part, these changes have occurred in parallel, but the same theoretical toolkit underlies both advances. This course gives a concise introduction to this theoretical toolkit, covering the fundamentals of contemporary probabilistic models in the study of language. Examples from both data analysis and state-of-the-art probabilistic modeling of linguistic cognition are given, with key conceptual connections repeatedly drawn between the two. I also give pointers to publicly available software implementations and students will see simple examples of use that will allow them to replicate case studies covered in class.

The course will for the most part be taught out of a textbook-in-progress I am writing, Probabilistic Models in the Study of Language. You can always access the latest version here.

## 4 Course organization

Each lecture of the 5-day course will involve a combination of boardwork and slides. I strongly encourage question-asking and discussion in my lectures; please raise your hand and I'll call on you.

## 5 Intended Audience

Researchers, postgraduate students, and highly motivated undergraduate students interested in probabilistic approaches to language. No prior exposure to probability theory or statistics is assumed, but we'll be using some high school calculus. For some parts of some lectures, participants will find basic familiarity with syntactic theory (e.g., context-free grammars) useful.

## 6 Syllabus (subject to modification)

Day | Topic | Readings | Other materials | |

Mon 6 Aug | Introductory probability theory & simple applications | PMSL Chapters 2-3 | ||

Tue 7 Aug | Parameter estimation, confidence intervals, hypothesis testing | PMSL Chapters 4-5 | ||

Wed 8 Aug | Single- and multi-level (hierarchical) generalized linear models | PMSL Chapter 6 & 8 | Intro to Bayes Nets; Intro to Bayesian Parameter Estimation; Introduction to Hierarchical (mixed-effects, multi-level) models | |

Thu 9 Aug | Latent-variable models | PMSL Chapter 9 | Mixture of Gaussians | |

Fri 10 Aug | Probabilistic Grammars | PMSL Chapter 10 | Topic Models (LDA); Probabilistic grammars and the probabilistic Earley algorithm; Application of probabilistic grammars and surprisal to garden-path disambiguation |

### 6.1 Detailed list of topics:

**Day 1**: Introduction to probability theory. Sample spaces, events, probability spaces, random variables, conditional probabilities, the binomial and normal distributions. Bayesian inference; example from simple phonetic category identification. Classic cases of phonetic category identification are treated with phonetic category as a binomially- distributed random variable and acoustic properties given category as normally distributed. Classic phoneme categorization curves are derived. Estimating probability densities; example from simple models of probabilistic phonotactics.**Day 2**: Parameter estimation, confidence intervals, and hypothesis testing. Maximum likelihood estimation and Bayesian parameter estimation/prediction; marginalization over parameter values. Examples from learning syntactic alternation frequencies and identifying contextual dependencies in linguistic sequences. Examples are worked out mathematically giving familiarization with concepts of statistical consistency, bias, and variance of estimators; modeling overdispersion through uncertainty in parameter estimates; beta functions and the beta-binomial model.**Day 3**: Single-level and multi-level generalized linear models. Examples from analyzing experimental and corpus data. Linear regression and logistic regression; hierarchical (a.k.a. mixed-effects, multi-level) models when all cluster identities are known. Brief familiarization with R's lme4 software.**Day 4**: Latent-variable models. Generalizing previous day's multi-level models by making cluster identities unknown. Examples from unsupervised learning of topics from text and phonetic categories from acoustic input. Brief familiarization with R's lda package for topics models, and with JAGS for Bayesian graphical models.**Day 5**: Probabilistic grammars. Focus on probabilistic context-free grammars; mathematical properties and conceptual intuitions. Examples from syntactic grammar induction and modeling human online syntactic processing. For acquisition, covering probabilistic models for grammar induction from strings. For processing, brief coverage of surprisal theory (Hale, 2001; Levy, 2008) and application to well-known cases of incremental probabilistic disambiguation of syntactic structure in sentence processing.