Modeling language evolution with codes that utilize context and phonetic features

Javad Nouri and Roman Yangarber
University of Helsinki


We present methods for investigating processes of evolution in a language family by modeling relationships among the observed languages.

The models aim to find regularities---regular correspondences in lexical data. We present an algorithm which codes the data using phonetic features of sounds, and learns long-range contextual rules that condition recurrent sound correspondences between languages. This gives us a measure of model quality: better models find more regularity in the data. We also present a procedure for imputing unseen data, which provides another method of model comparison. Our experiments demonstrate improvements in performance compared to prior work.