Projects
Minimalist grammar optimization
Syntactic literature tends towards a big-picture outlook, abstracting away from algorithmic-level details such as full specifications of lexical items or syntactic features being checked by each application of a structure-building operation. At the same time, differences between competing analyses of the same phenomenon seem to belong to a relatively low level of description. Assuming a sufficiently rich formalism compatible with the Minimalist framework, how can we choose between competing analyses on quantitative grounds? Framing this question as a learning problem, I have developed an algorithm capable of transforming a naive minimalist grammar over unsegmented words into a linguistically motivated one over morphemes. The project primarily focuses on learning morphological structure within complex words, extracting linguistically motivated generalizations and instantiating them as new lexical items.
-
Learning syntax via decomposition
University of Chicago, 2021. Chicago, IL
[dissertation] [code] -
Deconstructing syntactic generalizations with minimalist grammars
25th Conference on Computational Natural Language Learning, November 10-11, 2021. Punta Cana, Dominican Republic
[paper] [poster] -
Minimalist grammar induction over morphemes
3rd meeting of the Society for Computation in Linguistics, January 2-5, 2020. New Orleans, LA
[extended abstract] [poster]
Minimalist grammars and agreement
Stabler’s minimalist grammars provide a useful tool for modeling natural language syntax by defining grammar fragments in a very precise way. As a formalization of Chomsky’s Minimalist Program, they can accommodate linguistic analyses from the field of generative syntax. However, they have no machinery for encoding agreement: while morphology can be simulated by multiplying lexical items, there is no systematic way to state generalizations and implement actual proposals. My goal is to extend minimalist grammars with morphological features and operations on them.
A Javascript implementation of MGs with agreement can be found on this page.
- Morphological agreement in minimalist grammars
22nd Conference on Formal Grammar, July 22–23, 2017. Toulouse, France
[slides] [paper] [demo]
Formalizing Distributed Morphology
Together with Daniel Edmiston, I am working on a mathematically rigorous formalization of the Distributed Morphology framework. We are interested in adapting DM to work over strings. Distributed Morphology is typically depicted as operating on (binary) trees, meaning its strong-generative capacity is above regular. By constraining it to operating on strings, we restrict the strong-generative capacity of the morphological module to that of regular languages, providing an immediate explanation for the regularity of morphological phenomena in natural language.
-
Distributed Morphology as a regular relation
(with Daniel Edmiston)
1st meeting of the Society for Computation in Linguistics, January 4–7, 2018. Salt Lake City, UT
[extended abstract] [poster] -
Distributed Morphology over strings
(with Daniel Edmiston)
41st Annual Penn Linguistics Conference, March 23–26, 2017. Philadelphia, PA
[abstract] [poster]
Automated processing of agglutinative morphology
The majority of existing tools that deal with complex morphology rely on either hand-written rules or large text corpora. I am interested in taking the third option: extract (agglutinative) morphology from a small sample of fully analyzed word forms. The main challenge is to reconstruct allomorphs and morphotactic sequences missing from the sample. Hand-glossed texts are a natural output of linguistic fieldwork, readily available even for under-studied languages. The goal of this project is to facilitate tasks such as morphological parsing for agglutinative languages, with a focus on good performance even with very limited language-specific resources.
- Extracting morphophonology from small corpora
15th SIGMORPHON Workshop on Computational Research in Phonetics,
Phonology, and Morphology, October 31, 2018. Brussels, Belgium
[poster] [paper]
One application of this and related work is Diretra, a tool for computer-aided translation which I am developing in collaboration with Alëna Aksënova. Diretra is designed for and tested on Turkic languages; its primary goal is to provide a word-for-word translation of a given text, reflecting the morphological phenomena of the source language as precisely as possible.
-
Diretra, a customizable direct translation system: first sketches
(with Alëna Aksënova)
2nd International TRANSLATA Conference, October 30–November 1, 2014. Innsbruck, Austria
[slides] [paper] -
Морфологический анализатор Diretra: больше, чем глосса
[Diretra, a morphological analyzer: more than a gloss]
(with Alëna Aksënova)
201th Meeting of the Workshop on Mathematical Methods Applied to Linguistics, October 25, 2014. Moscow, Russia
[slides in Russian] -
An adaptable morphological parser for agglutinative languages
Italian Conference on Computational Linguistics, December 9–10, 2014. Pisa, Italy
[poster] [paper]
Turkic converbs
In Turkic languages, converbs — a type of non-finite verb form — are a regular means of constructing complex predications. The -p converb, present in the majority of Turkic languages, exhibits a number of interesting syntactic properties. In particular, -p converbs can correspond to both adjunct and coordinate syntactic structures.
-
On the dual nature of Turkic converbs [О двойственной природе тюркских конвербов]
(with Pavel Graschenkov)
Moscow State University Bulletin, Series 9: Philology (2), pp. 42–57, 2015. Moscow, Russia
[paper in Russian] -
Subjects in constructions with the -p converb in Kyrgyz and Mishar Tatar [Подлежащее в разносубъектных конструкциях с деепричастием на -p в киргизском языке и мишарском диалекте татарского языка]
10th Conference on Grammar and Typology for Young Researchers, November 21–23, 2013. St. Petersburg, Russia
[slides in Russian] [paper in Russian]