AuCoPro Semantics

Dutch & Afrikaans

Creative Commons License


Ben Verhoeven (1), Gerhard B. van Huyssteen (2) & Walter Daelemans (1)

(1) CLiPS Research Center, University of Antwerp
(2) Centre for Text Technology (CTexT), North-West University, South Africa


The AuCoPro-Semantics dataset serves for the automatic semantic analysis of compounds. It contains semantically annotated noun-noun compounds (NN) from Dutch and Afrikaans, split in two annotation rounds per language. The semantic annotation was performed with annotation guidelines based on those of Ó Séaghdha (2008).

Another part of the dataset contains other nominal compounds (XN) in Dutch, that were annotated using a newly developed annotation scheme.


This dataset was created within the 'Automatic Compound Processing (AuCoPro)' project that was funded by the Dutch Language Union (Nederlandse Taalunie), the Department of Arts and Culture (DAC) of South Africa and the National Research Foundation (NRF) of South Africa.


If you use this dataset in your research, make sure to cite one of the following papers:

Verhoeven, B., Daelemans, W., & Van Huyssteen, GB. (2012). Classification of Noun-Noun Compound Semantics in Dutch and Afrikaans. In: Proceedings of the Twenty-Third Annual Symposium of the Pattern Recognition Association of South Africa (PRASA). Pretoria, South Africa. 29-30 November. pp. 121-125. ISBN: 978-0-620-54601-0.

Verhoeven, B., & van Huyssteen, G. B. (2013). More Than Only Noun-Noun Compounds: Towards an Annotation Scheme for the Semantic Modelling of Other Noun Compound Types. In: Proceedings of the 9th Joint ISO - ACL SIGSEM Workshop on Interoperable Semantic Annotation. Potsdam, Germany.