Previous abstract | Contents | Next abstract

Features for unsupervised document classification

Unsupervised document classification is an important problem in practical text mining since training data is seldom available. In this paper we study the problem of term selection and the performance of various features for unsupervised text classification. The features studied are: principal components, independent components, and non-negative components. The clustering algorithm used is based on bipartite graph partitioning (Zha et al., 2001). The evaluation is performed using the newsgroups corpus.

S.H. Srinivasan, Features for unsupervised document classification. In: Dan Roth and Antal van den Bosch (eds.), Proceedings of CoNLL-2002, Taipei, Taiwan, 2002, pp. 36-42. [ps] [ps.gz] [pdf] [bibtex]

Last update: September 07, 2002. erikt@uia.ua.ac.be