Semi-supervised learning involves a training phase, but not all the training data is labelled.
In contrast to supervised learning, semi-supervised learning consists of a mixture of labelled and unlabelled data, with typically considerably more unlabelled than labelled data. Semi-supervised learning is used when there is a limited amount of labeled data available, but a large amount of unlabeled data can be leveraged to improve model performance. This approach is particularly useful in scenarios where labeling data is expensive or time-consuming.
For example, in image classification, a small set of labeled images can be used to train a model, which then predicts labels for a larger set of unlabeled images. These predicted labels can be used to refine the model, improving its accuracy.
The technique works by combining the strengths of both supervised and unsupervised learning. A model is initially trained on the small labeled dataset, and then it uses the patterns learned to make predictions on the unlabeled data. These predictions are then used to further train the model, effectively increasing the amount of labeled data. Self-training iteratively labels unlabeled data and retrains on the new corpus, while label propagation assigns labels to data points based on similarity to labeled data points.
Semi-supervised learning is important because it allows for better utilization of available data, leading to improved model performance with less labeled data. It is a valuable approach in machine learning, especially when dealing with large datasets where labeling is impractical.
- Alias
- Weak Supervision
- Related terms
- Self-training Label Propagation