Evaluating automation strategies for documenting endangered languages

Monday, March 5, 2012 - 14:00 - 15:30
CLiPS, Lange Winkelstraat 40-42, 2nd Floor, Antwerpen, Belgium
Alexis Palmer

Languages are dying at the rate of two each month. It is estimated that by the end of this century half of the approximately 6000 extant spoken languages will cease to be transmitted effectively from one generation of speakers to the next. Those working to document and preserve endangered languages face an immense amount of work with strong time pressure, small budgets, and limited human resources. In this talk I describe joint work with Jason Baldridge, Katrin Erk, and Taesun Moon investigating the effectiveness of various methods from machine learning and computational linguistics in cutting the cost of linguistic annotation for language documentation.

Using data from the Mayan language Uspanteko, we assess the potential of active learning and semi-automated annotation through a series of timed annotation experiments that consider annotation expertise, example selection methods, and suggestions from a machine classifier.


Alexis Palmer is a postdoctoral researcher at the MMCI Cluster of Excellence, Department of Computational Linguistics and Phonetics of Saarland University.


The colloquium takes place in Lange Winkelstraat 40-42, 2nd Floor, Antwerp (building L on the campus map).


If you have questions about the colloquium you can address them to roser.morante@ua.ac.be.

Signups closed for this CLiPS Colloquium