Previous abstract |
Contents |
Next abstract
Models, Kernels, and Algorithms for Discrete Data
When working with discrete data such as text and natural language,
statistical machine learning offers two fundamental sets of
techniques. Model-based approaches have many advantages from a
probabilistic point of view, yet purely discriminative methods such as
kernel machines, which are generally "model free," are appealing from
the perspective of error rates and algorithms. It is attractive to
explore methods that combine them.
In this talk we outline a confluence between these two main streams of
research. A new family of kernel methods for statistical learning is
presented that exploits the geometric structure of statistical models.
In particular, based on the heat equation on the Riemannian manifold
defined by the Fisher information metric on a statistical family, we
propose a family of kernels that provide a natural way of combining
generative statistical modeling with non-parametric discriminative
learning. As a special case, the kernels give a new approach to
designing learning algorithms for discrete data.
In addition to presenting new results, the talk will give an overview
of some of the main developments in recent years in these two research
areas as they relate to computational language learning.
References
- M. Aizerman, E. Braverman and L. Rozonoér,
Theoretical foundations of the potential function method in
pattern recognition learning.
Automations and Remote Control 25, volume 25,
pages 821-837, 1964.
- David M. Blei, Andrew Y. Ng and Michael I. Jordan,
Latent Dirichlet allocation.
Advances in Neural Information Processing Systems (NIPS),
volume 14, 2002.
- Bernhard E. Boser, Isabelle Guyon and Vladimir Vapnik,
A Training Algorithm for Optimal Margin Classifiers.
Computational Learing Theory,
pages 144-152, 1992.
- Michael Collins,
Discriminative Training Methods for Hidden Markov Models:
Theory and Experiments with Perceptron Algorithms.
EMNLP 2002, 2002.
- Thomas Hofmann,
Probabilistic latent semantic analysis.
Proceedings of Uncertainty in Artificial Intelligence
(UAI'99), Stockholm, Sweden, 1999.
- Mark Johnson,
Joint and Conditional Estimation of Tagging and Parsing Models.
Proceedings of ACL 2001, 2001.
- Risi Imre Konsor and John Lafferty,
Diffusion kernels on graphs and other discrete input spaces.
Machine Learning: Proceedings of the Nineteenth International
Conference, 2002.
[ps]
- John Lafferty, Fernando Pereira and Andrew McCallum,
Conditional random fields: Probabilistic models for segmenting
and labeling sequence data,
International Conference on Machine Learning (ICML), 2001.
[ps]
- Guy Lebanon and John Lafferty,
Cranking: Combining rankings using conditional probability models
on permutations.
Machine Learning: Proceedings of the Nineteenth International
Conference, 2002.
[ps]
- Thomas P. Minka,
A family of algorithms for approximate Bayesian inference.
Doctoral dissertation, Massachusetts Institute of Technology.
- Thomas Minka and John Lafferty,
Expectation-propagation for the generative aspect model.
Uncertainty in Artificial Intelligence (UAI), 2002.
[ps]
John Lafferty,
Models, Kernels, and Algorithms for Discrete Data.
Invited talk presented at CoNLL-2002.
Last update: September 08, 2002.
erikt@uia.ua.ac.be