Previous abstract | Contents | Next abstract

Two-dimensional clustering for text categorization

We propose a new method to improve the accuracy of Text Categorization using two-dimensional clustering. In a number of previous probabilistic approaches, texts in the same category are implicitly assumed to be generated from an identical distribution. We empirically show that this assumption is not accurate, and propose a new framework based on two-dimensional clustering to alleviate this problem. In our method, training texts are clustered so that the assumption is more likely to be true, and at the same time, features are also clustered in order to tackle the data sparseness problem. We conduct some experiments to validate the proposed two-dimensional clustering method.

Hiroya Takamura and Yuji Matsumoto, Two-dimensional clustering for text categorization. In: Dan Roth and Antal van den Bosch (eds.), Proceedings of CoNLL-2002, Taipei, Taiwan, 2002, pp. 29-35. [ps] [ps.gz] [pdf] [bibtex]

Last update: September 07, 2002. erikt@uia.ua.ac.be