Previous abstract | Contents | Next abstract

Models, Kernels, and Algorithms for Discrete Data

When working with discrete data such as text and natural language, statistical machine learning offers two fundamental sets of techniques. Model-based approaches have many advantages from a probabilistic point of view, yet purely discriminative methods such as kernel machines, which are generally "model free," are appealing from the perspective of error rates and algorithms. It is attractive to explore methods that combine them.

In this talk we outline a confluence between these two main streams of research. A new family of kernel methods for statistical learning is presented that exploits the geometric structure of statistical models. In particular, based on the heat equation on the Riemannian manifold defined by the Fisher information metric on a statistical family, we propose a family of kernels that provide a natural way of combining generative statistical modeling with non-parametric discriminative learning. As a special case, the kernels give a new approach to designing learning algorithms for discrete data.

In addition to presenting new results, the talk will give an overview of some of the main developments in recent years in these two research areas as they relate to computational language learning.

References

M. Aizerman, E. Braverman and L. Rozonoér, Theoretical foundations of the potential function method in pattern recognition learning. Automations and Remote Control 25, volume 25, pages 821-837, 1964.
David M. Blei, Andrew Y. Ng and Michael I. Jordan, Latent Dirichlet allocation. Advances in Neural Information Processing Systems (NIPS), volume 14, 2002.
Bernhard E. Boser, Isabelle Guyon and Vladimir Vapnik, A Training Algorithm for Optimal Margin Classifiers. Computational Learing Theory, pages 144-152, 1992.
Michael Collins, Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. EMNLP 2002, 2002.
Thomas Hofmann, Probabilistic latent semantic analysis. Proceedings of Uncertainty in Artificial Intelligence (UAI'99), Stockholm, Sweden, 1999.
Mark Johnson, Joint and Conditional Estimation of Tagging and Parsing Models. Proceedings of ACL 2001, 2001.
Risi Imre Konsor and John Lafferty, Diffusion kernels on graphs and other discrete input spaces. Machine Learning: Proceedings of the Nineteenth International Conference, 2002. [ps]
John Lafferty, Fernando Pereira and Andrew McCallum, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, International Conference on Machine Learning (ICML), 2001. [ps]
Guy Lebanon and John Lafferty, Cranking: Combining rankings using conditional probability models on permutations. Machine Learning: Proceedings of the Nineteenth International Conference, 2002. [ps]
Thomas P. Minka, A family of algorithms for approximate Bayesian inference. Doctoral dissertation, Massachusetts Institute of Technology.
Thomas Minka and John Lafferty, Expectation-propagation for the generative aspect model. Uncertainty in Artificial Intelligence (UAI), 2002. [ps]

John Lafferty, Models, Kernels, and Algorithms for Discrete Data. Invited talk presented at CoNLL-2002.

Last update: September 08, 2002. erikt@uia.ua.ac.be