Simple DL Part 2: Embeddings
December, 2020
In my opinion, you need to understand embeddings to really 'get' deep learning. Embeddings are the magic fairy dust that power every deep learning model, from ImageNet to GPT-3. I think in embeddings. Embeddings are the foundation for any intuition I have about DL, so all of my future posts in this series are going to refer back to the embedding concept.
Because this is important foundation, I'll be splitting this section into two parts. The first section tries to define embeddings, while the second part explains why they work.
TLDR
- Embeddings are stores of information represented as a list of floats (a float vector).
- Float-vectors are unique because they are continuous, which means we can think of them like points on a map (or, more generally, points on an N-dimensional surface).
- A good embedding is one where similar information is 'close' to each other in our map.
- Because embeddings are lists of floats that represent concepts, we can turn concepts into computation.
- A deep learning model is made of a stack of embeddings. Embeddings are constrained by the input data (features) and the loss function.
- The features limit what the embeddings can learn, and the loss tells the model what to prioritize. Models are as good as their features and as bad as their loss.
- We can improve a model's performance by changing the features, the architecture, or the loss function. These change the embeddings, which changes the underlying information map.