Syntactic literature tends towards a big-picture outlook, abstracting away from algorithmic-level details such as full specifications of lexical items or syntactic features being checked by each application of a structure-building operation. At the same time, differences between competing analyses of the same phenomenon seem to belong to a relatively low level of description. Assuming a sufficiently rich formalism compatible with the Minimalist framework, which of the known syntactic proposals fall out naturally from the data, and how can we choose between competing analyses on quantitative grounds? Framing this question as a learning problem, I am developing an algorithm to induce linguistically plausible Minimalist grammars from partially annotated text/dependency structures. The project primarily focuses on learning morphological structure within complex words, extracting linguistically motivated generalizations and instantiating them as new lexical items.
- Minimalist Grammar Induction over Morphemes
3rd meeting of the Society for Computation in Linguistics, January 2-5, 2020. New Orleans, LA
[extended abstract] [poster]
Stabler’s Minimalist Grammars provide a useful tool for modeling natural language syntax by defining grammar fragments in a very precise way. As a formalization of Chomsky’s Minimalist Program, they can accommodate linguistic analyses from the field of generative syntax. However, they have no machinery for encoding agreement: while morphology can be simulated by multiplying lexical items, there is no systematic way to state generalizations and implement actual proposals. My goal is to extend Minimalist Grammars with morphological features and operations on them.
- Morphological agreement in Minimalist Grammars
22nd Conference on Formal Grammar, July 22–23, 2017. Toulouse, France
[slides] [paper] [demo]
Together with Daniel Edmiston, I am working on a mathematically rigorous formalization of the Distributed Morphology framework. We are interested in adapting DM to work over strings. Distributed Morphology is typically depicted as operating on (binary) trees, meaning its strong-generative capacity is above regular. By constraining it to operating on strings, we restrict the strong-generative capacity of the morphological module to that of regular languages, providing an immediate explanation for the regularity of morphological phenomena in natural language.
Distributed Morphology as a regular relation
(with Daniel Edmiston)
1st meeting of the Society for Computation in Linguistics, January 4–7, 2018. Salt Lake City, UT
[extended abstract] [poster]
The majority of existing tools that deal with complex morphology rely on either hand-written rules or large text corpora. I am interested in taking the third option: extract (agglutinative) morphology from a small sample of fully analyzed word forms. The main challenge is to reconstruct allomorphs and morphotactic sequences missing from the sample. Hand-glossed texts are a natural output of linguistic fieldwork, readily available even for under-studied languages. The goal of this project is to facilitate tasks such as morphological parsing for agglutinative languages, with a focus on good performance even with very limited language-specific resources.
- Extracting morphophonology from small corpora
15th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, October 31, 2018. Brussels, Belgium
One application of this and related work is Diretra, a tool for computer-aided translation which I am developing in collaboration with Alëna Aksënova. Diretra is designed for and tested on Turkic languages; its primary goal is to provide a word-for-word translation of a given text, reflecting the morphological phenomena of the source language as precisely as possible.
Морфологический анализатор Diretra: больше, чем глосса [Diretra, a morphological analyzer: more than a gloss]
(with Alëna Aksënova)
201th Meeting of the Workshop on Mathematical Methods Applied to Linguistics, October 25, 2014. Moscow, Russia
In Turkic languages, converbs — a type of non-finite verb form — are a regular means of constructing complex predications. The -p converb, present in the majority of Turkic languages, exhibits a number of interesting syntactic properties. In particular, -p converbs can correspond to both adjunct and coordinate syntactic structures.
О двойственной природе тюркских конвербов [On the dual nature of Turkic converbs]
(with Pavel Graschenkov)
Moscow State University Bulletin, Series 9: Philology (2), pp. 42–57, 2015. Moscow, Russia
Подлежащее в разносубъектных конструкциях с деепричастием на -p в киргизском языке и мишарском диалекте татарского языка [Subjects in constructions with the -p converb in Kyrgyz and Mishar Tatar]
10th Conference on Grammar and Typology for Young Researchers, November 21–23, 2013. St. Petersburg, Russia