Data poisoning is an attack where malicious data is injected into the training dataset to compromise the performance of a machine learning model.
Data poisoning is used in various applications such as testing the robustness of models, security research, and understanding model vulnerabilities. It helps in identifying weaknesses in models and improving their resilience to malicious inputs.
The process involves injecting carefully crafted malicious data into the training dataset, which causes the model to learn incorrect patterns and make wrong predictions. This can lead to degraded performance or specific targeted behaviors in the model.
For example, in a spam detection system, a Data Poisoning
attack might involve adding a large number of spam emails labeled as non-spam to the training dataset, causing the model to misclassify spam emails as legitimate.
Data poisoning attacks can be classified into different types based on the attacker’s goals, such as availability attacks (aimed at degrading overall model performance) and integrity attacks (aimed at causing specific incorrect predictions). Common techniques used in data poisoning attacks include label flipping, data injection, and backdoor attacks.
Unlike adversarial attacks, which exploit the model in its current state, data poisoning aims to alter the model’s behavior by corrupting the training data.
Understanding and defending against data poisoning attacks is essential for developing robust and secure machine learning models that can withstand malicious inputs.
- Related terms
- Adversarial Attack Security