Navigation

Linky linky

UCSD Linguistics
My new home
AndyLab
My Posse
Center for Research in Language
Interdisciplinary language research
SDSU Linguistics
My former home department
Me@SDSU
My old webpage at State
Discursus
My mostly serious blog
Valid XHTML | Valid CSS
Template "Life is Simple!" by Solucija.
Interests

All kinds of phenomena interest me, but I'm more interested in experimental science than theoretical, and more interested in pure science than applied. Theory is important, but I don't have much patience for drifting away from testability. Applications are important, but it's the big questions that get me excited.

I'm interested in computational modeling of psycholinguistic processes, applying quantitative corpus methods to cognitive linguistics questions, and developing linguistic and computational models of discourse structure. I think Maxent Grammars are a great tool for talking about variation and gradience, with fewer weaknesses than the alternatives. I would also like to bounce around ideas about parallel changes in language and the rest of culture, Chinese abbreviations or clipped compounds (缩写), grammaticalization, and emergence as the resolution of half the arguments philosophers ever had.

Current Projects

My main current project is documenting generational and regional variation in the tone sandhi of the dialects around Jinhua in Zhejiang, China. There is an amazing amount of regional variation in this area, and a fair amount of generational differences too. There is a fair amount of dialectology data, but not the kind of detailed data needed to model synchronic variation.

The other big semi-current project is my masters thesis for the comp-ling program at SDSU, which I have basically finished researching and have half written, but it has seen only fitful progress in the last few years.

My thesis research developed a method for evaluating hierarchical discourse segmentation, i.e. shallow discourse parsing or unlabeled outlining, which is difficult to evaluate taking into consideration the differing importance of the section breaks and the intrinsic imprecision of the section break locations. The research involved recruiting several dozen students to annotate passages via a web form interface, developing a method for deriving a gold standard from conflicting annotations, adapting two segmentation programs to produce hierarchical segmentations, and proposing a statistical measure suitable to the peculiarities of hierarchical discourse segmentation.

Lucien Carroll. forthcoming. Evaluation of Hierarchical Discourse Segmentation of Expository Speech. Unpublished thesis, carried out under the supervision of Rob Malouf and Eniko Csomay. Presented at the 29th Linguistics Students Association Colloquium at SDSU, April 8, 2006. slides

Abstract: There is a large body of literature describing work in linear discourse segmentation, especially of news data, and some work describing algorithms for hierarchical discourse segmentation. However, little work has been done on segmenting more conversational genres, and even less on evaluating hierarchical segmentation. I describe a method for compiling a gold standard for tree segmentation of expository monolog, and I propose an error metric. I then evaluate two hierarchical segmentation algorithms with that metric. The segmentation algorithms both perform quite poorly on this language variety, but one of the two is shown to be significantly better than baseline segmentations.

In the coming year I hope to start cool stuff based on stochastic optimality theory or information-theoretic models of sentence processing, and continue a collaboration dealing with Chinese discourse structure.

Past Projects

At an internship with Beth Sundheim at SPAWAR, I worked on several projects related to multi-lingual text processing and named entity recognition, including a study derived from the TDT 2002 link detection task, comparing the relative value of general lexical features, temporal expressions, and named entities for the identification of event-based topics, and comparing the published temporal expression vector spaces to some I developed.

Lucien Carroll. 2005. Topic Detection Using Time Expressions. Work carried out under the supervision of Beth Sundheim, as part of an internship with the SDSU Research Foundation.

In collaboration with Erin Stevenson and Rebecca Colavin, I worked on developing a system to distinguish degrees of bias in politically oriented websites, approaching it from two directions: as a language classification problem, like distinguishing subjective language from objective language; and as a network partitioning problem, using position within the hyperlink network to identify affiliation. We harvested the test corpus from the internet, and hand-annotated the target classes. For the linguistic approach we used standard machine learning methods with linguistically-informed features, and for the network approach we used mathematical methods from social network analysis.

Lucien Carroll, Erin Stevenson, and Rebecca Colavin. 2005. Website Bias Estimation with Combined Language Modeling and Network Analysis. Work carried out under the supervision of Rob Malouf. Presented by Erin at the 28th Linguistics Students Association Colloquium at SDSU, April 16, 2005. slides other slides

Hannah Rohde, Rebecca Colavin, Lara Taylor and I worked on the problem of semantic role labeling, as proposed in the CoNLL 2005 Shared Task. We implemented a system that derived a wide variety of rule-based syntactic and semantic features for the sentences in the CoNLL 2005 corpus, to train a conditional random fields model of the target series of semantic role labels.

Hannah Rohde, Lucien Carroll, Rebecca Colavin, and Lara Taylor. 2005. Head-Noun Proto-Properties for Semantic Role Labeling. Part of an attempted entry in the CoNLL 2005 Shared Task, under the supervision of Rob Malouf. Presented by Hannah at the 28th Linguistics Students Association Colloquium at SDSU, April 16, 2005. slides