torch.nn.HuberLoss
new HuberLoss(options?: HuberLossOptions)
- readonly
reduction(Reduction) - readonly
delta(number)
Huber Loss: robust regression loss that is less sensitive to outliers than MSE.
Pieces together quadratic (MSE) behavior for small errors and linear (L1) behavior for large errors. Combines the best of both worlds: smooth and differentiable near zero like MSE, but robust to outliers like L1. Essential for:
- Robust regression (ignoring extreme outlier data points)
- Object detection (bounding box regression: Faster R-CNN, YOLO, SSD)
- Reinforcement learning (DQN, Huber used for temporal difference error)
- Financial forecasting (robust estimation of price movements)
- Signal processing with occasional spike noise
Unlike MSE which penalizes outliers quadratically, Huber penalizes them linearly beyond a threshold delta. Unlike L1 which has a non-differentiable kink at zero, Huber is smooth and differentiable everywhere.
When to use Huber Loss:
- Data contains outliers or noise that would dominate MSE
- You need smooth gradients for better convergence (vs L1)
- Bounding box regression tasks (standard practice)
- You want a hybrid of L1 and MSE robustness and smoothness
Trade-offs:
- vs MSE: More robust to outliers; slightly more complex computation (branching)
- vs L1: Smoother gradients near zero; requires tuning delta hyperparameter
- Empirical: Often provides better results than L1 or MSE for noisy regression
- Optimization: Generally stable and converges well
Algorithm: For each error e_i = predicted_i - target_i:
- If |e_i| ≤ delta: loss_i = 0.5 * e_i²
- If |e_i| > delta: loss_i = delta * (|e_i| - 0.5 * delta)
Continuous and differentiable at ±delta.
- Best of both worlds: Combines L1 robustness with MSE smoothness
- Continuous and differentiable: Smooth everywhere, including at ±delta
- Delta tuning critical: Choose based on expected error magnitude
- Object detection standard: Default in Faster R-CNN, YOLO, and similar detectors
- Outlier robustness: More robust than MSE while maintaining optimization stability
- Computation: Slightly more expensive than pure L1 or MSE (conditional branch)
- Recommended: Generally preferred over pure L1 loss in practice
Examples
// Robust regression in object detection (bounding box regression)
const huber_loss = new torch.nn.HuberLoss({ reduction: 'mean', delta: 1.0 });
const predicted_boxes = torch.randn([32, 4]); // 32 boxes with [x, y, w, h]
const target_boxes = torch.randn([32, 4]);
const loss = huber_loss.forward(predicted_boxes, target_boxes);
// Huber loss is standard in object detection (handles occasional bbox outliers)// Comparing Huber vs L1 vs MSE on regression with outliers
const errors = torch.tensor([
0.1, // Small error
0.2, // Small error
-0.15, // Small error
10.0 // Outlier!
]);
const l1_loss = new torch.nn.L1Loss();
const mse_loss = new torch.nn.MSELoss();
const huber_loss = new torch.nn.HuberLoss({ reduction: 'sum', delta: 1.0 });
const l1 = l1_loss.forward(errors, torch.zeros_like(errors));
// L1: 0.1 + 0.2 + 0.15 + 10.0 = 10.45 (treats large error linearly)
const mse = mse_loss.forward(errors, torch.zeros_like(errors));
// MSE: 0.01 + 0.04 + 0.0225 + 100.0 = 100.0725 (heavily penalizes outlier)
const huber = huber_loss.forward(errors, torch.zeros_like(errors));
// Huber: best of both worlds, smooth yet robust// Tuning delta parameter for your problem
const predictions = torch.randn([100, 1]);
const targets = torch.randn([100, 1]);
// Conservative (delta=0.5): More robust to outliers
const conservative = new torch.nn.HuberLoss({ reduction: 'mean', delta: 0.5 });
// Balanced (delta=1.0): Default, good compromise
const balanced = new torch.nn.HuberLoss({ reduction: 'mean', delta: 1.0 });
// Sensitive (delta=10.0): Closer to MSE behavior
const sensitive = new torch.nn.HuberLoss({ reduction: 'mean', delta: 10.0 });
// Choose delta based on expected error distribution in your task// Object detection training loop
class ObjectDetectionModel extends torch.nn.Module {
rpn: torch.nn.Module; // Region proposal network
// ... other layers
forward(x: torch.Tensor): torch.Tensor {
const proposals = this.rpn.forward(x);
return proposals;
}
}
const model = new ObjectDetectionModel();
const huber = new torch.nn.HuberLoss({ reduction: 'mean', delta: 1.0 });
const batch_images = torch.randn([32, 3, 224, 224]);
const predicted_boxes = model.forward(batch_images);
const target_boxes = torch.randn([32, 4]);
const loss = huber.forward(predicted_boxes, target_boxes);
// Huber loss handles occasional bbox regression outliers gracefully