Prof. Andrew Kehler
Department of Linguistics
University of California, San Diego
Wednesdays and Fridays, 3:30-5pm, McGill 3133
Office Hours: Tuesdays 2:30-4:00, or by appointment
Office: McGill 5236, x4-6239
Overview and Requirements
This is a graduate seminar course on computational models of discourse interpretation. The focus will be on methods for interpreting referential expressions (esp. pronouns), models for establishing the structure and coherence of discourse, and interactions between these.
I will present introductory lectures during the first three classes. The other readings will be presented by the class participants on a rotating schedule. Students are expected to contribute to the discussion of the papers during each class.
Presenters should meet with me beforehand to discuss the presentation of their papers. In other words, don't freak out if you don't fully understand what you have to present - I will give advice on what to focus on, help explain any confusing parts, and fill in necessary background knowledge.
Complete references for papers are listed on the attached list. The ``Additional Readings'' are required for the presenters and should be discussed in the presentation. These readings remain optional for the remainder of the class. There is no textbook for the class, and all readings will be made available for copying. (I will bring copies of required readings to class. Additional readings are available for copying in the language lab.)
In addition to class presentations, students will be expected to complete a problem set or exercise early in the term and a final term project by the end of the course. The project can either be a long research paper (10-15 single-spaced pages) or an operational computer program accompanied by a shorter system description (3-5 pages). Students should consult with me in determining their term project. A list of ideas for projects will be distributed later in the term.
The final grade will be based on: (i) in class presentations, (ii) class participation, (iii) the problem set, and (iv) the final term paper/project. Class attendance is necessary.
Programming experience is not necessary. A familiarity with basic linguistic concepts, and an ability to analyze language from an interpretation/computational perspective, are both necessary.
I will start out by briefly working through a textbook chapter on discourse interpretation, to introduce basic concepts and ideas with respect to reference resolution, discourse coherence, and discourse structure.
Required reading: [Kehler2000b]
Having done this, we'll delve into certain original works in greater depth. We begin by taking a closer look at algorithms for resolving reference.
Discourse interpretation algorithms require that a model of the operative discourse state be maintained, containing representations of the entities that have been referred to thus far and the relationships in which they participate. Webber's [Webber1983] paper introduces the notion of a discourse model, and describes aspects of the representation of referents in such a model. This work in part inspired [Sag and Hankamer1984] to revise their previous distinction between `deep' and `surface' anaphora.
Required reading: [Webber1983]
Additional reading: [Sag and Hankamer1984]
Webber's paper was concerned with the alternatives that an utterance makes available for subsequent reference, and the characteristics a representation must have to support the computation of these alternatives. The remainder of the papers in this part of the course are concerned with identifying the correct referent for a given referring expression. An early algorithm based on a syntactic search process, due to [Hobbs1978], was surprisingly accurate when manually applied to certain types of naturally-occurring data. The results reported by [Matthews and Chodorow1988] provided psycholinguistic support for such search processes.
Required reading: [Hobbs1978]
Additional reading: [Matthews and Chodorow1988]
The focusing of one's attention on salient entities is a component of several aspects of human cognition. Focus-based approaches to pronoun resolution posit that certain entities are in focus in the discourse model at any given point, and that pronouns are used to refer to these. Sidner's [Sidner1983] algorithm is an early method which relies on the notion of immediate foci, or those entities that are most central to particular utterances. In contrast, Grosz's [Grosz1977] paper addresses global foci, or those entities that are relevant to an entire discourse.
Required reading: [Sidner1983]
Additional reading: [Grosz1977]
The centering approach to pronoun resolution grew out of a synthesis of Sidner's work and early attempts at using a center-based logical representation to constrain discourse-level inferencing [Joshi and Kuhn1979,Joshi and Weinstein1981]. [Grosz, Joshi, and Weinstein1995] was published after a manuscript version had been widely circulated since 1986. An algorithm for pronoun resolution derived from the manuscript was presented in [Brennan, Friedman, and Pollard1987].
Since the presentation of the original theory, centering has been applied to, and adapted for, several languages other than English. These works include [Kameyama1986] (covered later) and [Walker, Iida, and Cote1994] for Japanese, [Di Eugenio1996] for Italian, and [Strube and Hahn1996] for German.
Required reading: [Walker, Iida, and Cote1994]
I published a criticism of centering algorithms, in which I focused primarily on two flaws. Strube's [Strube1998] paper presents a modification to centering that addresses the first of these.
Required reading: [Kehler1997]
Additional reading: [Strube1998]
None of the aforementioned algorithms address well-known parallelism effects in pronoun interpretation. [Kameyama1986] presents an early centering-based account of Japanese pronoun resolution that incorporates preferences for grammatical role parallelism. The [Smyth1994] paper is part of an ongoing debate about subject position versus parallelism preferences in the psycholinguistics literature.
Required reading: [Kameyama1986]
Additional reading: [Smyth1994]
At this point, we have discussed several algorithms that incorporate one or more pronoun resolution preferences, most of which have not been applied to real data using an actual implemented system. The [Lappin and Leass1994] paper describes an actual implementation, in which a variety of different preferences are computed and assigned weights that together yield an overall measure of salience. The [Kennedy and Boguraev1996] paper describes a derivative approach that relies less heavily on robust syntactic analyses as input. [Tetreault1999] briefly compares the results of implementations of four algorithms in the literature.
Required reading: [Lappin and Leass1994]
We now move on to the other major topic for the class: the determination of discourse coherence and structure. We haven't seen the last of pronoun interpretation, however.
Another approach to resolving reference does not treat it as a separate process at all, viewing it instead as a by-product of coherence resolution. [Hobbs1979] introduces us to the notion of identifying coherence relations that hold between utterances in a discourse, and shows how referents of pronouns can be identified during coherence establishment. Hovy's [Hovy1990] paper discusses the question of which and how many such coherence relations might exist.
Required reading: [Hobbs1979]
Additional reading: [Hovy1990]
The process of identifying coherence, among other language interpretation processes, requires a method for performing inference. [Hobbs et al.1993] tell us how this can be accomplished using the unsound inference process of abduction.
Required reading: [Hobbs et al.1993]
Additional readings: none
We've now seen several types of approach to pronoun resolution, each incorporating different types of preferences: some based on tracking foci or centers, some based on a notion of parallelism, and some based on semantically-based inference processes. At this point, I'll chime in with a second look at coherence and coreference which integrates all of these. I'll contrast this approach with a short psycholinguistics paper [Stevenson, Nelson, and Stenning1993] that represents approaches that integrate these preferences without factoring out the effect of coherence establishment processes.
Required reading: [Kehler2000a, Chapter 6]
Additional reading: [Stevenson, Nelson, and Stenning1993]
Another well-known approach to coherence relations is Rhetorical Structure Theory (RST) [Mann and Thompson1987]. Although RST is descriptive in nature, it has been used in the natural language generation community to help generation systems produce coherent discourses. [Hovy1993] provides an overview of some of this work.
Required reading: [Mann and Thompson1987] (excerpts)
Additional reading: [Hovy1993]
Another approach to discourse coherence and structure is the Linguistic Discourse Model [Polanyi1988,Scha and Polanyi1988]. This approach emphasizes discourse syntax, building a hierarchical structure on a clause-by-clause basis much like how sentence structures are built on a constituent-by-constituent basis.
Required reading: [Polanyi1988]
Additional reading: [Scha and Polanyi1988]
Earlier in the course, we went into depth on ways to resolve pronouns, but left the more complex process of resolving demonstratives alone. [Webber1991] posits constraints on `discourse deictic' reference based on the current structure of the discourse. [Cristea and Webber1997] discuss how this approach can be extended to encode discourse expectations about what is to come in the subsequent discourse.
Required reading: [Webber1991]
Additional reading: [Cristea and Webber1997]
Grosz and Sidner's [Grosz and Sidner1986] influential work posits a tripartite structure to discourse, integrating linguistic structure, focus of attention, and the intentions behind the speaker's uttering the discourse. This approach, which has been applied primarily to dialogues (versus texts), contrasts strongly with the previous approaches we have considered.
Required reading: [Grosz and Sidner1986]
Additional readings: none
[Moore and Pollack1992] call Hobbs's approach to coherence the informational view, and Grosz and Sidner's approach the intentional view. They argue that both perspectives are in fact necessary. They argue that the fact that RST intermixes both types of relation renders it incapable of adequately handling certain types of passages. [Hobbs1997] then chimes in with his take on the distinction, seeing the problem of determining informational coherence as a subcomponent of determining intentional coherence.
Required reading: [Moore and Pollack1992]
Additional reading: [Hobbs1997]
It's now time to wrap up with a brief description of what else has occurred in the world of discourse processing relating to these topics, and where we appear to be headed.