home | research | my idiolect | personal | &c. | cv

My main research interest is computational psycholinguistics, using computers to model what goes on in someone's head when using language. This leads to an obvious question: what can a computer, which works differently than a human brain, tell us about human language use? The answer is surprisingly much. Computer models allow us to better analyze actual usage patterns in language to pick up on subtle patterns that (hopefully) reveal deep insights into the manner by which humans use language. My current work addresses this on three fronts.

Artificial Language Learning

How do children manage to learn a language? It's a tall order, a mysterious process that few adults are able to handle, yet one that almost all children manage. Because of its mystery and some seemingly insurmountable obstacles to language learning (such as poverty of the stimulus), various innateness hypotheses have been advanced to explain language learning, most of which can be viewed as variants of Universal Grammar.

However, the rise of computational models have shown that the insurmountable obstacles may be surmountable after all. One of the triumphs of the connectionist program was the ability of a neural network to learn to conjugate certain irregular verbs. More recently, Bayesian models have shown that individual aspects of simple grammars can be learned by cognitively plausible models; see Frank et al (2008) and Piantadosi et al (2008). The next step is to develop Bayesian models that can effectively and simultaneously learn multiple aspects of a grammar, and we are currently working on such a model. (Joint work with Roger Levy.)

Bursty Topic Models

Topic models are an important class of language models. A topic model seeks out hidden topics in text data, based on word co-occurrence data. (Non-text data can be used as well, although the meaning of topics may be less clear.) The model supposes that a document is a mixture of words from different topics, and attempts to learn which topics generate which words. For example, if a topic model were trained on articles from Science, it might find topics that we humans would think of as "fluid dynamics", "evolutionary biology", or "inorganic chemistry". Topic models suffer from one glaring defect, though; they do not account for bustiness, the tendency of a word to keep re-appearing within a document. We revised the basic topic model framework to account for this, and show that the new framework can be incorporated into more advanced topic models. (Joint work with Charles Elkan.)

Speaker Choice and Mixed Categories

As I see it, the big question in psycholinguistics is the same as the big question in the Watergate hearings: what do people know about language, and when do they know it? A range of psycholinguistic experiments have hinted at the depth of langauge users' knowledge of their language. For instance, listeners and readers are adept at anticipating upcoming words in a sentence, and are immediately able to tell if a sentence they have never heard before fits in their grammar or not. Even in cases where people have trouble with a sentence (such as "The coach smiled at the player tossed the frisbee"), it looks like the difficulty often arises from the sentence butting heads with our expectations of how a sentence should work. In cases like this, it almost seems that language users know too much about language.

So what does it mean to know a language? It's more than just knowing the definition of words and the rules of the grammar. You also know idiosyncratic things like that "The couch needs cleaning" sounds fine but "An idea needs forming" sounds bad, or that "stupider" sounds okay while "torpider" sounds awful. How do you gain this knowledge? Does it get set early on, or are adults still learning such things? And how much does this knowledge shape the langauge you use day in and day out? It's a fascinatingly unclear picture.

My primary research interest is to clear this picture up. I'm particularly interested in syntactic alternations as a window into the decision-making process that underlies sentence construction. Returning to "The couch needs cleaning", what factors cause a person to say that instead of the also well-formed sentence "The couch needs to be cleaned"? A regression model trained on corpus examples shows that people appear to take various properties of the sentence's subject into account when making this choice; from this we learn such facts as that a concrete subject like couch is more likely to use the needs -ing option than an abstract subject (like idea) is.

Presentations/Publications

Submitted. Gabriel Doyle & Charles Elkan. Accounting for Burstiness in Topic Models.

2008. Gabriel Doyle & Roger Levy. Environment prototypicality effects on syntactic alternation. Oral presentation at the 2008 meeting of the Berkeley Linguistics Society, February 8-10, 2008. [slides] [manuscript]

2008. Gabriel Doyle & Roger Levy. Mixed categories and gradient grammatical constraints. Poster presentation at the 2008 Annual Meeting of the Linguistic Society of America, January 3-6, 2008.

2007. J. Grant Loomis & Gabriel Doyle. Durational Differences of /s/ at Prosodic Boundaries. Oral Presentation at Western Conference on Linguistics (WECOL), November 30 - December 2, 2007.

2005. Gabriel Doyle. Calculating the Knot Floer Homology of (1,1) Knots. Undergraduate thesis. Princeton U. Mathematics Department. Advised by Jacob Rasmussen. [pdf] [related]

gabe doyle | uc san diego | dept of linguistics | ap&m 3321 | gdoyle at ling ucsd edu