Projects

Cross-formalism MDL

This project is developed together with John Goldsmith and builds upon ideas outlined in (Goldsmith 2015). We propose a three-term version of minimum description length (MDL) that incorporates the cost of encoding the (formalized) linguistic theory itself, in addition to individual grammars and compressed descriptions of corpora. Using phonological and morphological patterns as a basic example, we demonstrate how alternative linguistic theories could be evaluated quantitatively with the help of this metric.

Cross-formalism MDL
LingBuzz, 2023
[preprint] [code]

Minimalist grammar optimization

Syntactic literature tends towards a big-picture outlook, abstracting away from algorithmic-level details such as full specifications of lexical items or syntactic features being checked by each application of a structure-building operation. At the same time, differences between competing analyses of the same phenomenon seem to belong to a relatively low level of description. Assuming a sufficiently rich formalism compatible with the Minimalist framework, how can we choose between competing analyses on quantitative grounds? Framing this question as a learning problem, I have developed an algorithm capable of transforming a naive minimalist grammar over unsegmented words into a linguistically motivated one over morphemes. The project primarily focuses on learning morphological structure within complex words, extracting linguistically motivated generalizations and instantiating them as new lexical items.

Evaluating syntactic proposals using minimalist grammars and minimum description length
Journal of Language Modelling, Vol. 11 No. 1, 2023
[paper]
Towards a discovery procedure for minimalist grammars
LingBuzz, 2022
[preprint]
Learning syntax via decomposition
University of Chicago, 2021. Chicago, IL
[dissertation] [code]
Deconstructing syntactic generalizations with minimalist grammars
25th Conference on Computational Natural Language Learning, November 10-11, 2021. Punta Cana, Dominican Republic
[paper] [poster]
Minimalist grammar induction over morphemes
3rd meeting of the Society for Computation in Linguistics, January 2-5, 2020. New Orleans, LA
[extended abstract] [poster]

Minimalist grammars and agreement

Stabler’s minimalist grammars provide a useful tool for modeling natural language syntax by defining grammar fragments in a very precise way. As a formalization of Chomsky’s Minimalist Program, they can accommodate linguistic analyses from the field of generative syntax. However, they have no machinery for encoding agreement: while morphology can be simulated by multiplying lexical items, there is no systematic way to state generalizations and implement actual proposals. My goal is to extend minimalist grammars with morphological features and operations on them.
A Javascript implementation of MGs with agreement can be found on this page.
A more recent (and less cumbersome) iteration of this approach, developed together with Gregory Kobele, is Agreement over Channels. Under this perspective, agreement transfers purely morphological information from head to head along channels established via syntactic feature checking.

Agreeing minimalist grammars
(with Gregory Kobele)
LingBuzz, 2023
[preprint]
Agree as information transmission over dependencies
(with Gregory Kobele)
Syntax, Vol. 25 Issue 4, 2022
[paper]
Morphological agreement in minimalist grammars
22nd Conference on Formal Grammar, July 22–23, 2017. Toulouse, France
[slides] [paper] [demo]

Formalizing Distributed Morphology

Together with Daniel Edmiston, I worked on a mathematically rigorous formalization of the Distributed Morphology framework. We were interested in adapting DM to work over strings. Distributed Morphology is typically depicted as operating on (binary) trees, meaning its strong-generative capacity is above regular. By constraining it to operating on strings, we restricted the strong-generative capacity of the morphological module to that of regular languages, providing an immediate explanation for the regularity of morphological phenomena in natural language.

Distributed Morphology as a regular relation
(with Daniel Edmiston)
1st meeting of the Society for Computation in Linguistics, January 4–7, 2018. Salt Lake City, UT
[extended abstract] [poster]
Distributed Morphology over strings
(with Daniel Edmiston)
41st Annual Penn Linguistics Conference, March 23–26, 2017. Philadelphia, PA
[abstract] [poster]

Automated processing of agglutinative morphology

The majority of existing tools that deal with complex morphology rely on either hand-written rules or large text corpora. I am interested in taking the third option: extract (agglutinative) morphology from a small sample of fully analyzed word forms. The main challenge is to reconstruct allomorphs and morphotactic sequences missing from the sample. Hand-glossed texts are a natural output of linguistic fieldwork, readily available even for under-studied languages. The goal of this project is to facilitate tasks such as morphological parsing for agglutinative languages, with a focus on good performance even with very limited language-specific resources.

Extracting morphophonology from small corpora
15th SIGMORPHON Workshop on Computational Research in Phonetics,
Phonology, and Morphology, October 31, 2018. Brussels, Belgium
[poster] [paper]

One application of this and related work is Diretra, a tool for computer-aided translation which I worked on in collaboration with Alëna Aksënova. Diretra was designed for and tested on Turkic languages; its primary goal was to provide a word-for-word translation of a given text, reflecting the morphological phenomena of the source language as precisely as possible.

Diretra, a customizable direct translation system: first sketches
(with Alëna Aksënova)
2nd International TRANSLATA Conference, October 30–November 1, 2014. Innsbruck, Austria
[slides] [paper]

An adaptable morphological parser for agglutinative languages
Italian Conference on Computational Linguistics, December 9–10, 2014. Pisa, Italy
[poster] [paper]

Turkic converbs

In Turkic languages, converbs — a type of non-finite verb form — are a regular means of constructing complex predications. The -p converb, present in the majority of Turkic languages, exhibits a number of interesting syntactic properties. In particular, -p converbs can correspond to both adjunct and coordinate syntactic structures.

On the dual nature of Turkic converbs [О двойственной природе тюркских конвербов]
(with Pavel Graschenkov)
Moscow State University Bulletin, Series 9: Philology (2), pp. 42–57, 2015. Moscow, Russia
[paper in Russian]
Subjects in constructions with the -p converb in Kyrgyz and Mishar Tatar [Подлежащее в разносубъектных конструкциях с деепричастием на -p в киргизском языке и мишарском диалекте татарского языка]
10th Conference on Grammar and Typology for Young Researchers, November 21–23, 2013. St. Petersburg, Russia
[slides in Russian] [paper in Russian]