Abstract

Linguistic communication involves two perspectives: that of a speaker (or, more generally, producer) and of a listener (or, more generally, receiver). Despite this, many linguistic theories are articulated around production, and formulated as generative devices, and dedicated either to deriving surface utterances from more abstract representations or to quantifying and characterising the basis for productive inferences. Perhaps a sign of this general focus is the fact that the word ‘speaker’ has come to refer to humans as linguistic agents in general. This talk takes the opposite perspective, and shows how attending to the information available for comprehension rather than production changes our perspective on linguistic theory. I focus on inflectional morphology and ask: Given an inflected wordform, what, in its shape, allows speakers to infer its morpho-syntactic properties? I describe a new theory of morphological exponence, grounded in set theory and fully implemented, which describes the distribution of discriminative sub-strings in inflectional paradigms. Along the way, we will take two detours through orthogonal but interlocking questions, both illustrated on inflectional morphology, and with a wider scope across linguistics (sub)-fields. First, how can we scale up typological investigation using computational methods? I describe a data-direct approach which aims to automatically construct comparable linguistic analyses, with implemented pipelines leading all the way from minimally analysed data to measurements and comparisons. Second, I dive into matters of data management and standardisation: how concretely can we create long-lasting, reusable, high quality open datasets?