source β€” Steven Gong

Representation learning learns useful feature representations of data, often in an unsupervised or self-supervised way, so that they can be reused for downstream tasks (e.g. classification, clustering).

TL;DR: Essentially, the latent space that this data gets encoded in should capture some useful representation.

Some examples include Autoencoders, VAE, the contrastive learning concept from CLIP, masked modelling as in BERT.

However, the current project in foundation models uses DINO (no negative examples). Those that use a contrastive loss in CLIP don’t suffer as much from mode collapse, because the negative examples serve as a regularization term.

Representation learning and generative models often overlap, but they are not the same thing.