Projects

Cross-formalism MDL

This project is developed together with John Goldsmith and builds upon ideas outlined in (Goldsmith 2015). We propose a three-term version of minimum description length (MDL) that incorporates the cost of encoding the (formalized) linguistic theory itself, in addition to individual grammars and compressed descriptions of corpora. Using phonological and morphological patterns as a basic example, we demonstrate how alternative linguistic theories could be evaluated quantitatively with the help of this metric.

Minimalist grammar optimization

Syntactic literature tends towards a big-picture outlook, abstracting away from algorithmic-level details such as full specifications of lexical items or syntactic features being checked by each application of a structure-building operation. At the same time, differences between competing analyses of the same phenomenon seem to belong to a relatively low level of description. Assuming a sufficiently rich formalism compatible with the Minimalist framework, how can we choose between competing analyses on quantitative grounds? Framing this question as a learning problem, I have developed an algorithm capable of transforming a naive minimalist grammar over unsegmented words into a linguistically motivated one over morphemes. The project primarily focuses on learning morphological structure within complex words, extracting linguistically motivated generalizations and instantiating them as new lexical items.

Minimalist grammars and agreement

Stabler’s minimalist grammars provide a useful tool for modeling natural language syntax by defining grammar fragments in a very precise way. As a formalization of Chomsky’s Minimalist Program, they can accommodate linguistic analyses from the field of generative syntax. However, they have no machinery for encoding agreement: while morphology can be simulated by multiplying lexical items, there is no systematic way to state generalizations and implement actual proposals. My goal is to extend minimalist grammars with morphological features and operations on them.
A Javascript implementation of MGs with agreement can be found on this page.
A more recent (and less cumbersome) iteration of this approach, developed together with Gregory Kobele, is Agreement over Channels. Under this perspective, agreement transfers purely morphological information from head to head along channels established via syntactic feature checking.

  • Agreeing minimalist grammars
    (with Gregory Kobele)
    LingBuzz, 2023
    [preprint]

  • Agree as information transmission over dependencies
    (with Gregory Kobele)
    Syntax, Vol. 25 Issue 4, 2022
    [paper]

  • Morphological agreement in minimalist grammars
    22nd Conference on Formal Grammar, July 22–23, 2017. Toulouse, France
    [slides] [paper] [demo]

Formalizing Distributed Morphology

Together with Daniel Edmiston, I worked on a mathematically rigorous formalization of the Distributed Morphology framework. We were interested in adapting DM to work over strings. Distributed Morphology is typically depicted as operating on (binary) trees, meaning its strong-generative capacity is above regular. By constraining it to operating on strings, we restricted the strong-generative capacity of the morphological module to that of regular languages, providing an immediate explanation for the regularity of morphological phenomena in natural language.

Automated processing of agglutinative morphology

The majority of existing tools that deal with complex morphology rely on either hand-written rules or large text corpora. I am interested in taking the third option: extract (agglutinative) morphology from a small sample of fully analyzed word forms. The main challenge is to reconstruct allomorphs and morphotactic sequences missing from the sample. Hand-glossed texts are a natural output of linguistic fieldwork, readily available even for under-studied languages. The goal of this project is to facilitate tasks such as morphological parsing for agglutinative languages, with a focus on good performance even with very limited language-specific resources.

One application of this and related work is Diretra, a tool for computer-aided translation which I worked on in collaboration with Alëna Aksënova. Diretra was designed for and tested on Turkic languages; its primary goal was to provide a word-for-word translation of a given text, reflecting the morphological phenomena of the source language as precisely as possible.

Turkic converbs

In Turkic languages, converbs — a type of non-finite verb form — are a regular means of constructing complex predications. The -p converb, present in the majority of Turkic languages, exhibits a number of interesting syntactic properties. In particular, -p converbs can correspond to both adjunct and coordinate syntactic structures.