msg.Machine Learning Catalogue

Distillation is a machine learning technique where a smaller model is trained to replicate the behavior of a larger, more complex model.

Distillation is used when there is a need to deploy models in resource-constrained environments without sacrificing too much performance. It is commonly applied in scenarios such as mobile applications, edge computing, and real-time systems.

For example, a large, powerful model (teacher) trained on a vast dataset can be used to generate predictions, which are then used to train a smaller model (student) that can be deployed on a mobile device. This smaller model can perform nearly as well as the larger model but with significantly reduced computational requirements.

The technique works by using the predictions of the larger model as soft targets to train the smaller model, effectively transferring the knowledge from the large model to the small one. Soft targets are the probabilities produced by the larger model, which provide more information than hard targets, which are the actual class labels.

In some cases, distillation can also involve using a specialized small model to help train a larger, multi-purpose model. This approach can be useful when the small model has expertise in a specific domain that the larger model can benefit from.

Distillation is important because it allows for the creation of efficient models that can be deployed in environments with limited computational resources. It is a powerful approach in machine learning, enabling the transfer of knowledge from large, complex models to smaller, more efficient ones.

Alias: Model Distillation Knowledge Distillation
Related terms: Student-Teacher Model Soft Targets