Preliminary Program

Modeling language evolution with codes that utilize context and phonetic features

Javad Nouri and Roman Yangarber
University of Helsinki

Abstract

We present methods for investigating processes of evolution in a language family by modeling relationships among the observed languages.
The models aim to find regularities---regular correspondences in lexical data. We present an algorithm which codes the data using phonetic features of sounds, and learns long-range contextual rules that condition recurrent sound correspondences between languages. This gives us a measure of model quality: better models find more regularity in the data. We also present a procedure for imputing unseen data, which provides another method of model comparison. Our experiments demonstrate improvements in performance compared to prior work.