Datasets

Path
TwiSty Corpus TwiSty is a corpus developed for research in author profiling. It contains personality (MBTI) and gender annotations for a total of 18,168 authors spanning six languages. We distribute the Twitter ids of these authors as well as the ids of their... https://www.clips.uantwerpen.be/datasets/twisty-corpus
CLiPS Stylometry Investigation (CSI) Corpus The CSI corpus is a yearly expanded corpus of student texts in two genres: essays and reviews. The purpose of this corpus lies primarily in stylometric research, but other applications are possible. There is a vast amount of meta-data available,... https://www.clips.uantwerpen.be/datasets/csi-corpus
AuCoPro Semantics The AuCoPro-Semantics dataset serves for the automatic semantic analysis of compounds. It contains semantically annotated noun-noun compounds (NN) from Dutch and Afrikaans, split in two annotation rounds per language. The semantic annotation was... https://www.clips.uantwerpen.be/datasets/aucopro-semantics
deLearyous The deLearyous dataset is a Dutch (Flemish) dataset for emotion classification following the framework of Leary's Rose, also known as the Interpersonal Circumplex. The dataset contains 11 conversations that were annotated on the sentence level with... https://www.clips.uantwerpen.be/datasets/delearyous
Personae Corpus The Personae corpus was collected for experiments in Authorship Attribution and Personality Prediction. It consists of 145 Dutch-language essays, written by 145 different students (BA in Linguistics and Literature at the University of Antwerp,... https://www.clips.uantwerpen.be/datasets/personae-corpus