Towards automatic morphological analysis: aligning inflected forms


Several hypotheses exist according to which defectivity and overabundance can arise as a result of specific properties in the implicative relations which hold between paradigm forms. This presentation addresses the fundamental question of how we can obtain automatically good characterizations of these relations, starting from raw unsegmented inflected forms.

Most inflectional analyses rely at their core on some alignment of inflected forms. Be it implicit or explicit, this alignment makes it possible to identify repeated and variant material across forms, which in turn is necessary to formulate alternation patterns, to segment markers and exponents, or to identify stems.

While there are several optimal solutions for aligning pairs of forms, the task of aligning multiple forms together is complex, and can not be solved optimally. This task, well known in other fields such as biology (where multiple genome sequences are aligned together) and historical linguistics (where words are aligned together to find cognates), needs to be solved by heuristic algorithms. In this talk, we present the specific problems which arise when aligning inflected forms for this purpose. We describe two systems which we created for that purpose, and compare them using a series of evaluation metrics. While this is still a work in progress, it is our hope that providing measures of alignment quality will make it possible to see incremental improvements in this fundamental step for quantitative morphology.