msg.Machine Learning Catalogue

Self-supervised learning is a type of unsupervised learning where the system decides which structure to learn based on the raw data.

Self-supervised learning is used when there is a need to leverage large amounts of unlabeled data to learn useful representations or features. Self-supervised learning is commonly applied in natural language processing and computer vision tasks. The technique works by creating pseudo-labels from the input data itself, which are then used to train the model.

For example, in natural language processing, a model might be trained to predict the next word in a sentence based on the previous words. This way, it learns meaningful structure within texts without human labels. Using downstream finetuning, this pretrained model can be tailored to specific use cases. Popular large language models like GPT and BERT use this type of learning to understand patterns in texts.

Self-supervised learning is important because it allows for the utilization of vast amounts of unlabeled data, leading to the learning of rich and useful representations. It is a powerful approach in machine learning, enabling models to learn from data without the need for manual labeling.

Alias: SSL Self-supervision
Related terms: Representation Learning Contrastive Learning Finetuning Unsupervised Learning

Self-supervised Learning