Productivity, Reuse, and Competition between Generalizations

Timothy J. O'Donnell

MIT

A much-celebrated aspect of language is that it allows us to express and comprehend an unbounded number of thoughts. This is made possible because language consists of several combinatorial systems which can productively build novel forms using a large inventory of stored, reusable parts: the lexicon. For any given language, however, there are many more potentially storable units of structure than are actually used in practice — each giving rise to many ways of forming novel expressions. For example, English contains highly productive and generalizable suffixes (e.g., -ness; "Lady-Gagaesqueness," "pine-scentedness") and suffixes that cannot be generalized (e.g., -ity; "scarcity," "normality"). In some cases, multiple generalizations compete to form novel words. Adults have gradient judgments about the well-formedness of output forms: "remortibility" appears to be slightly preferred to "remortibleness" while "depulsiveness" is preferred to "depulsivity". How are such subtle differences in generalizability and reusability represented? What are the basic, stored building blocks at each level of linguistic structure? When is productive computation licensed and when not? What principles are used by adults to resolve competition between generalizations? How can the child acquire these systems of knowledge?

I will discuss a mathematical framework designed to address these questions and examine its application to derive specific computational models at different levels of linguistic structure. The framework is based on the idea that lexicon learning can be understood as balancing productivity (computation) and reuse (storage). The problem is a tradeoff between the storage of frequently reused (sub)structures and the need to productively generalize to novel expressions. This tradeoff is a special case of a general and well-known principle balancing the simplicity/generality of hypotheses against the degree to which they explain input data and is used in a variety of learning frameworks such as Bayes and Minimum Description Length. Nevertheless, it has surprisingly deep and far-ranging consequences for language learning, especially the question of how to resolve competition between generalizations.

Competition Workshop
2015 Linguistic Summer Institute
Sunday, July 12, 2015