Representation learning learns useful feature representations of data, often in an unsupervised or self-supervised way, so that they can be reused for downstream tasks (e.g. classification, clustering).
TL;DR: Essentially, the latent space that this data gets encoded in should capture some useful representation.
Some examples include Autoencoders, VAE, the contrastive learning concept from CLIP, masked modelling as in BERT.
However, the current project in foundation models uses DINO (no negative examples). Those that use a contrastive loss in CLIP donβt suffer as much from mode collapse, because the negative examples serve as a regularization term.
Representation learning and generative models often overlap, but they are not the same thing.
- Many generative models naturally learn useful representations as a by-product.