Previous abstract | Contents | Next abstract

A Comparison of PCFG Models

In this paper, we compare three different approaches to build a probabilistic context-free grammar for natural language parsing from a tree bank corpus: 1) a model that simply extracts the rules contained in the corpus and counts the number of occurrences of each rule 2) a model that also stores information about the parent node's category and, 3) a model that estimates the probabilities according to a generalized k-gram scheme with k=3. The last one allows for a faster parsing and decreases the perplexity of test samples.


Jose Luis Verdú-Mas, Jorge Calera-Rubio and Rafael C. Carrasco, A Comparison of PCFG Models. In: Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, 2000. [ps] [pdf] [bibtex]
Last update: June 27, 2001. erikt@uia.ua.ac.be