For most entity disambiguation systems, the secret recipes are feature representations for mentions and entities, most of which are handcrafted Bag-of-Words (BoW) representations. BoW has several drawbacks: (1) It ignores the intrinsic meaning of words/entities; (2) It often results in high-dimension vector spaces and expensive computation; (3) Handcrafting representations often requires extensive experimental trials, lacking of a general guideline to guarantee its quality. In this paper, we propose a different approach named EDKate. We first learn low-dimensional continuous vector representations for entities and words by jointly embedding knowledge bases and text in the same vector space. Then we utilize these embeddings to design simple but effective features and build a two-layer disambiguation model. Extensive experiments on real-word data sets show that (1) The embedding-based features are very effective. Even a single one embedding-based feature can beat the combination of several BoW-based features. (2) The superiority is even more promising in a difficult set where the mention-to-entity prior cannot work well. (3) The proposed embedding method is much better than trivial implementations of some off-the-shelf embedding algorithms. (4) We compared our EDKate with existing methods/systems and the results are also positive.