# Advanced Probabilistic Modeling in R (2015 LSA Summer Institute second-session course)

## 1 Course information

Lecture Dates | July 20, 23, 27, and 30 (Mondays and Thursdays) |

Lecture Times | 10:30am-12:20pm |

Lecture Location | Harper 140 |

Office Hours | Tuesday 22 July 3:15-4:45pm; Friday 24 July 10:30am-noon; Tuesday 28 July 3:15-4:45pm (subject to change) |

Office Hours Location | Plein Air Cafe |

Class webpage | http://ling.ucsd.edu/~rlevy/teaching/lsa2015probmodels/ |

## 2 Instructor information

Instructor | Roger Levy (rlevy@ucsd.edu) |

Instructor Title | Associate Professor, Department of Linguistics, University of California at San Diego |

## 3 Course Description

Probabilistic models have thoroughly reshaped computational linguistics and continues to profoundly change other areas in the scientific study of language, ranging from psycholinguistics to syntax and phonology and even pragmatics and sociolinguistics. This change has included (a) qualitative improvements in our ability to analyze complex linguistic datasets and (b) new conceptualizations of language knowledge, acquisition, and use. For the most part, these changes have occurred in parallel, but the same theoretical toolkit underlies both advances. This course gives a concise introduction to this theoretical toolkit, covering the fundamentals of contemporary probabilistic models in the study of language. Examples from both data analysis and state-of-the-art probabilistic modeling of linguistic cognition are given, with key conceptual connections repeatedly drawn between the two. I also give pointers to publicly available software implementations and students will see simple examples of use that will allow them to replicate case studies covered in class.

The course will for the most part be taught out of a textbook-in-progress I am writing, Probabilistic Models in the Study of Language. You can always access the latest version here.

## 4 Course organization

Each lecture of the 4-day course will involve a combination of slides and boardwork. I strongly encourage question-asking and discussion in my lectures; please raise your hand and I'll call on you.

There is a mailing list that you can sign up for that I use to communicate with class participants.

## 5 Intended Audience

Researchers, postgraduate students, and highly motivated undergraduate students interested in probabilistic approaches to language. No prior exposure to probability theory or statistics is assumed, but we'll be using some high school calculus. For some parts of some lectures, participants will find basic familiarity with syntactic theory (e.g., context-free grammars) useful.

## 6 Syllabus (subject to modification)

There is a beginning-of-class survey that I'd appreciate it if you filled out, so that I can get more information about the backgrounds of class participants.

Day | Topic | Slides | Readings | Homework |

Mon 20 July | Essentials: Bayes nets, parameter estimation, hypothesis testing, confidence intervals. | Lecture 1 | PMSL Chapter 4; PMSL Chapter 5 | |

Thu 23 July | Brief review of linear regression. Repeated-measures ANOVA. Introduction to mixed-effects models. | Lecture 2 ( with builds) | PMSL Chapter 8 | Homework 1 (solutions) |

Mon 27 July | Mixed-effects models practicum I. How to keep it maximal. | Lecture 3 ( with builds) | Barr et al., 2013 | Homework 2 (solutions) |

Thu 30 July | Mixed-effects models practicum II. R formula arcana. Beta-binomial regression. | Levy, 2014; Morgan & Levy, 2015 |