Optimizing text-based age and gender prediction on social media for detecting grooming by sexual predators

TitleOptimizing text-based age and gender prediction on social media for detecting grooming by sexual predators
Publication TypeTalks
Authorsvan de Loo, J., De Pauw G., & Daelemans W.
Place PresentedATILA 2015, Antwerp
Year of Publication2015
Date Presented16/10/2015
Abstract

We present some results of author profiling experiments that explore the capabilities of text-based age and gender prediction for the application of detecting harmful content and conduct on social media (project AMiCA). More specifically, we focus on the use case of detecting sexual predators who try to ""groom"" children online and possibly provide false age and gender information in their user profiles. We performed age and gender classification experiments on a dataset of nearly 380,000 Dutch chat posts from the Belgian social network Netlog. We evaluated and compared binary age classifiers trained to separate younger and older authors according to different age boundaries. We found that macro-averaged F-scores increased when the age boundary was raised and that use-case applicable performance levels can be achieved for the classification of minors versus adults, thereby providing a useful component in a cybersecurity monitoring tool for social network moderators.