A Coding Guide to Demonstrate Targeted Data Poisoning Attacks in Deep Learning
These articles are AI-generated summaries. Please check the original sources for full details.
Targeted Data Poisoning Attacks in Deep Learning
This tutorial demonstrates a realistic data poisoning attack where labels in the CIFAR-10 dataset are manipulated to observe the resulting impact on model behavior. By flipping labels from a target class to a malicious class during training, the study shows how subtle data corruption can lead to systematic misclassification.
Why This Matters
Ideal machine learning models assume clean, representative training data; however, real-world datasets are often vulnerable to malicious manipulation. Data poisoning attacks can compromise model integrity, leading to biased predictions or targeted failures, with potential costs reaching millions of dollars in compromised systems, especially in contexts like autonomous driving or finance.
Key Insights
- Label Flipping: A common data poisoning technique where the assigned label of a training sample is changed.
- CIFAR-10 Dataset: A widely used benchmark dataset for image classification, consisting of 60,000 32x32 color images in 10 classes.
- ResNet Architecture: The study utilizes a ResNet-18 model, a convolutional neural network known for its ability to train deeper networks and achieve high accuracy.
Working Example
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, Dataset
import numpy as np
CONFIG = {
"batch_size": 128,
"epochs": 10,
"lr": 0.001,
"target_class": 1,
"malicious_label": 9,
"poison_ratio": 0.4,
}
torch.manual_seed(42)
np.random.seed(42)
class PoisonedCIFAR10(Dataset):
def __init__(self, original_dataset, target_class, malicious_label, ratio, is_train=True):
self.dataset = original_dataset
self.targets = np.array(original_dataset.targets)
self.is_train = is_train
if is_train and ratio > 0:
indices = np.where(self.targets == target_class)[0]
n_poison = int(len(indices) * ratio)
poison_indices = np.random.choice(indices, n_poison, replace=False)
self.targets[poison_indices] = malicious_label
def __getitem__(self, index):
img, _ = self.dataset[index]
return img, self.targets[index]
def __len__(self):
return len(self.dataset)
def get_model():
model = torchvision.models.resnet18(num_classes=10)
model.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
model.maxpool = nn.Identity()
return model.to(CONFIG["device"])
transform = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465),
(0.2023, 0.1994, 0.2010))
])
base_train = torchvision.datasets.CIFAR10(root="./data", train=True, download=True, transform=transform)
poison_ds = PoisonedCIFAR10(base_train, CONFIG["target_class"], CONFIG["malicious_label"], ratio=CONFIG["poison_ratio"])
poison_loader = DataLoader(poison_ds, batch_size=CONFIG["batch_size"], shuffle=True)
model = get_model()
optimizer = optim.Adam(model.parameters(), lr=CONFIG["lr"])
criterion = nn.CrossEntropyLoss()
for _ in range(CONFIG["epochs"]):
model.train()
for images, labels in poison_loader:
images = images.to(CONFIG["device"])
labels = labels.to(CONFIG["device"])
optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
Practical Applications
- Autonomous Vehicles: An attacker could poison training data to cause a self-driving car to misclassify road signs.
- Spam Filtering: Poisoning the training data for a spam filter to allow malicious emails to bypass detection.
References:
Continue reading
Next article
API First in Practice: How We Made Frontend Types Predictable and Stable
Related Content
How to Implement Functional Components of Transformer and Mini-GPT Model from Scratch Using Tinygrad
This tutorial demonstrates building a Mini-GPT model from scratch with Tinygrad, achieving a model with 18,816 parameters.
Meet SymTorch: A PyTorch Library for Translating Deep Learning Models into Mathematical Equations
Cambridge Researchers introduce SymTorch, a library using symbolic regression to translate PyTorch models into closed-form equations, achieving an 8.3% throughput increase in LLM inference benchmarks.
Build and Train Advanced Architectures with Residual Connections, Self-Attention, and Adaptive Optimization Using JAX, Flax, and Optax
A JAX-based tutorial implements self-attention and residual blocks, achieving 92% accuracy on synthetic data with adaptive optimization.