Current Research

My dissertation work investigates the complex discourse structures that emerge in online forums among participants in comment threads discussing controversial or polarizing topics. My research question asks whether it is possible to automatically detect the natural subgroups of commenters within a particular thread based upon the users' stance with respect to the topic under discussion, as manifested in the ways in which people express their ideological alignment or disagreement with other discourse participants. I am exploring a number of different classification and clustering machine learning techniques for this problem, drawing on the current research in the areas of sentiment analysis, topic modeling, latent network analysis, and sarcasm detection. I am also examining how local coherence is established between a comment and the responses it generates, developing a typology of inter-comment discourse coherence relations, and investigating how these relations can be leveraged in the subgroup detection task.

Previous Research

In natural language, the semantic relationship between neighboring sentences in a text can be marked explicitly by the use of a discourse connective, such as a coordinating or subordinating conjunction (e.g. "but", "because"), an adverb (e.g. "consequently"), or a prepositional phrase (e.g. "as a result"). However, as often as not, the local coherence between clauses is not explicitly signalled by a connective; instead the producer of the text leaves the coherence of the local discourse to be inferred by the reader. I explored whether it is possible to predict, given a particular coherence relationship holding between two adjacent sentences, if it is better for a discourse connective to be used to signal the intended discourse relation, or for the relationship to be left implicit. This information could be used to develop more naturalistic outputs of natural language generation and summarization systems. I studied this question using a combination of computational modeling and experimental work, developing a logistic regression classifier trained on data from the Penn Discourse Treebank corpus and collecting human judgments of discourse fragments with and without connectives.

Linguistic fieldwork

I worked with Dr. Troi Carleton at San Francisco State University on a collaborative community-based research project involving the documentation and preservation of the Zapotec dialect of Teotitlán del Valle, in Oaxaca, Mexico, and the creation of an archive of the oral tradition, including personal histories, traditional practices and local legends and myths. During my time in the field, I collected, transcribed, and translated texts, and carried out phonological and morphosyntactic analysis, as well as basic lexicography. I was also responsible for standardizing the transcription and morphemic glossing of the sizeable corpus of texts collected over the five year history of the project.