Previous abstract | Contents | Next abstract
This paper addresses the issue of the automatic induction of syntactic categories from unannotated corpora. Previous techniques give good results, but fail to cope well with ambiguity or rare words. An algorithm, context distribution clustering (CDC), is presented which can be naturally extended to handle these problems.