Introduction: 

In this project, the focus is on writing style, and the relation it has to the author and characteristics of the author (such as his/her personality, gender, education level,...). We develop a technique that not only allows us to automatically extract style markers (e.g. the use of specific grammatical structures or function words) from text, but also applies these markers to identify the author of a previously unseen text. A system for computational stylometry can be applied to tasks like authorship attribution, personality prediction, gender prediction, or plagiarism detection on any type of text (be it literary, newspaper or blog).

Project information
Abstract: 

In this project, we investigate a methodology for the automatic extraction and analysis of style that we want to apply to both individual authors (authorship attribution, both fiction and non-fiction) and groups of authors (extraction of stylistich characteristics associated to gender and age). This methodology covers several aspects: (1) Automatic linguistic analysis of documents by means of available text analysis tools on the level of morphological structure, part of speech, global syntactic structures and semantic roles (subject, object, temporal, location) for the construction of potentially relevant stylistic characteristics. (2) Unsupervised and supervised learning techniques for selecting characteristics with high information value and constructing a model of authorial style. (3) Evaluation of these models by (a) comparison with stylistic analyses in linguistics and literary science and (b) empiric testing of the predictive power of the models.

Abstract Dutch: 

In dit project stellen we een methodologie voor de automatische extractie en analyse van stijlkenmerken voor die we willen toepassen op individuele auteurs (auteursherkenning, zowel van non-fictie als fictie) en groepen van auteurs (extractie van stijlkenmerken geassocieerd met sekse en leeftijdsgroep). De methodologie bevat de volgende onderdelen: (1) Een automatische taalkundige analyse van documenten met behulp van de beschikbare tekstanalyse-instrumenten op het niveau van morfologische structuur, woordsoort, globale syntactische structuur en semantische rollen (subject, object, temporeel, locatie) voor de constructie van potentieel relevante stilistische kenmerken. (2) Gebruik van niet-gesuperviseerde en gesuperviseerde leertechnieken voor de selectie van de meest informatieve stilistische kenmerken en de constructie van een model van de stijl van een auteur (of group van auteurs). (3) Evaluatie van de geconstrueerde modellen door (a) vergelijking met stilistische analyses in taalkunde en literatuurwetenschap en (b) empirische toetsing van de voorspellende kracht van de modellen.

Project Leader(s): 
Walter Daelemans
Guy De Pauw
External Collaborator(s): 

Edward Vanhoutte

Publications + Talks

Luyckx, K. (2011).  Authorship Attribution of E-mail as a Multi-Class Task. (V. Petras, P. Forner, P. Clough, Ed.).CLEF 2011 Labs and Workshop, Notebook Papers. PDF
Luyckx, K., & Daelemans W. (2009).  TACTiCS, a Tool for Analyzing and Categorizing Texts using Characteristics of Style. Presented at the 19th Meeting of Computational Linguistics in the Netherlands (CLIN), Groningen, The Netherlands.
Luyckx, K., & Daelemans W. (2008).  Authorship Attribution and Verification with Many Authors and Limited Data. Presented at the 22nd International Conference on Computational Linguistics (COLING 2008), Manchester, UK.
Luyckx, K., & Daelemans W. (2008).  Authorship Attribution and Verification with Many Authors and Limited Data. Proceedings of the 22nd International Conference on Computational Linguistics (COLING 2008). 513--520. PDF
Luyckx, K., & Daelemans W. (2008).  Authorship Attribution and Verification with Many Authors and Limited Data. Presented at the 12th ATILA Research Meeting, Antwerp, Belgium.
Luyckx, K., & Daelemans W. (2008).  Authorship Attribution and Verification with Many Authors and Limited Data. Proceedings of the 20th Belgian-Dutch Artificial Intelligence Conference (BNAIC). 335-336. PDF
Luyckx, K., & Daelemans W. (2008).  Authorship Attribution and Verification with Many Authors and Limited Data.. Presented at the 20th Belgian-Netherlands Conference on Artificial Intelligence (BNAIC 2008), Enschede, The Netherlands.
Syndicate content