Previous abstract | Contents | Next abstract

Letter level learning for language independent diacritics restoration

This paper represents a method for diacritics restoration based on learning mechanisms that act at letter level. The method requires no additional tagging tools or resources other than raw text, which makes it independent of the language, and particularly appealing for languages for which there are few resources available. The algorithm was evaluated on four different languages, namely Czech, Hungarian, Polish and Romanian, and an average accuracy of over 98% was observed.


Rada F. Mihalcea and Vivi A. Nastase, Letter level learning for language independent diacritics restoration. In: Dan Roth and Antal van den Bosch (eds.), Proceedings of CoNLL-2002, Taipei, Taiwan, 2002, pp. 105-111. [ps] [ps.gz] [pdf] [bibtex]
Last update: September 07, 2002. erikt@uia.ua.ac.be