Entity Disambiguation by Knowledge and Text Jointly Embedding

Wei Fang1, Jianwen Zhang2, Dilin Wang3, Zheng Chen2, Ming Li1
1SYSU-CMU Joint Institute of Engineering, School of Electronics and Information Technology, Sun Yat-Sen University / SYSU-CMU Shunde International Joint Research Institute, 2Microsoft, Redmond, WA, 3Computer Science, Dartmouth College


Abstract

For most entity disambiguation systems, the secret recipes are feature representations for mentions and entities, most of which are handcrafted Bag-of-Words (BoW) representations. BoW has several drawbacks: (1) It ignores the intrinsic meaning of words/entities; (2) It often results in high-dimension vector spaces and expensive computation; (3) Handcrafting representations often requires extensive experimental trials, lacking of a general guideline to guarantee its quality. In this paper, we propose a different approach named EDKate. We first learn low-dimensional continuous vector representations for entities and words by jointly embedding knowledge bases and text in the same vector space. Then we utilize these embeddings to design simple but effective features and build a two-layer disambiguation model. Extensive experiments on real-word data sets show that (1) The embedding-based features are very effective. Even a single one embedding-based feature can beat the combination of several BoW-based features. (2) The superiority is even more promising in a difficult set where the mention-to-entity prior cannot work well. (3) The proposed embedding method is much better than trivial implementations of some off-the-shelf embedding algorithms. (4) We compared our EDKate with existing methods/systems and the results are also positive.