torch.nn.HuberLoss

class HuberLoss extends Module

new HuberLoss(options?: HuberLossOptions)

readonlyreduction(Reduction)
readonlydelta(number)

Huber Loss: robust regression loss that is less sensitive to outliers than MSE.

Pieces together quadratic (MSE) behavior for small errors and linear (L1) behavior for large errors. Combines the best of both worlds: smooth and differentiable near zero like MSE, but robust to outliers like L1. Essential for:

Robust regression (ignoring extreme outlier data points)
Object detection (bounding box regression: Faster R-CNN, YOLO, SSD)
Reinforcement learning (DQN, Huber used for temporal difference error)
Financial forecasting (robust estimation of price movements)
Signal processing with occasional spike noise

Unlike MSE which penalizes outliers quadratically, Huber penalizes them linearly beyond a threshold delta. Unlike L1 which has a non-differentiable kink at zero, Huber is smooth and differentiable everywhere.

When to use Huber Loss:

Data contains outliers or noise that would dominate MSE
You need smooth gradients for better convergence (vs L1)
Bounding box regression tasks (standard practice)
You want a hybrid of L1 and MSE robustness and smoothness

Trade-offs:

vs MSE: More robust to outliers; slightly more complex computation (branching)
vs L1: Smoother gradients near zero; requires tuning delta hyperparameter
Empirical: Often provides better results than L1 or MSE for noisy regression
Optimization: Generally stable and converges well

Algorithm: For each error e_i = predicted_i - target_i:

If |e_i| ≤ delta: loss_i = 0.5 * e_i²
If |e_i| > delta: loss_i = delta * (|e_i| - 0.5 * delta)

Continuous and differentiable at ±delta.

\begin{aligned} H_\delta(e) = \begin{cases} \frac{1}{2} e^2 & \text{if } |e| \leq \delta \\ \delta(|e| - \frac{1}{2}\delta) & \text{if } |e| > \delta \end{cases} \\ \text{where } e = \text{predicted} - \text{target} \end{aligned}

Best of both worlds: Combines L1 robustness with MSE smoothness
Continuous and differentiable: Smooth everywhere, including at ±delta
Delta tuning critical: Choose based on expected error magnitude
Object detection standard: Default in Faster R-CNN, YOLO, and similar detectors
Outlier robustness: More robust than MSE while maintaining optimization stability
Computation: Slightly more expensive than pure L1 or MSE (conditional branch)
Recommended: Generally preferred over pure L1 loss in practice

Examples

// Robust regression in object detection (bounding box regression)
const huber_loss = new torch.nn.HuberLoss({ reduction: 'mean', delta: 1.0 });

const predicted_boxes = torch.randn([32, 4]);  // 32 boxes with [x, y, w, h]
const target_boxes = torch.randn([32, 4]);

const loss = huber_loss.forward(predicted_boxes, target_boxes);
// Huber loss is standard in object detection (handles occasional bbox outliers)

// Comparing Huber vs L1 vs MSE on regression with outliers
const errors = torch.tensor([
  0.1,   // Small error
  0.2,   // Small error
  -0.15, // Small error
  10.0   // Outlier!
]);

const l1_loss = new torch.nn.L1Loss();
const mse_loss = new torch.nn.MSELoss();
const huber_loss = new torch.nn.HuberLoss({ reduction: 'sum', delta: 1.0 });

const l1 = l1_loss.forward(errors, torch.zeros_like(errors));
// L1: 0.1 + 0.2 + 0.15 + 10.0 = 10.45 (treats large error linearly)

const mse = mse_loss.forward(errors, torch.zeros_like(errors));
// MSE: 0.01 + 0.04 + 0.0225 + 100.0 = 100.0725 (heavily penalizes outlier)

const huber = huber_loss.forward(errors, torch.zeros_like(errors));
// Huber: best of both worlds, smooth yet robust

// Tuning delta parameter for your problem
const predictions = torch.randn([100, 1]);
const targets = torch.randn([100, 1]);

// Conservative (delta=0.5): More robust to outliers
const conservative = new torch.nn.HuberLoss({ reduction: 'mean', delta: 0.5 });

// Balanced (delta=1.0): Default, good compromise
const balanced = new torch.nn.HuberLoss({ reduction: 'mean', delta: 1.0 });

// Sensitive (delta=10.0): Closer to MSE behavior
const sensitive = new torch.nn.HuberLoss({ reduction: 'mean', delta: 10.0 });

// Choose delta based on expected error distribution in your task

// Object detection training loop
class ObjectDetectionModel extends torch.nn.Module {
  rpn: torch.nn.Module;  // Region proposal network
  // ... other layers

  forward(x: torch.Tensor): torch.Tensor {
    const proposals = this.rpn.forward(x);
    return proposals;
  }
}

const model = new ObjectDetectionModel();
const huber = new torch.nn.HuberLoss({ reduction: 'mean', delta: 1.0 });

const batch_images = torch.randn([32, 3, 224, 224]);
const predicted_boxes = model.forward(batch_images);
const target_boxes = torch.randn([32, 4]);

const loss = huber.forward(predicted_boxes, target_boxes);
// Huber loss handles occasional bbox regression outliers gracefully

torch.nn.HuberLoss

Examples

See Also

torch.nn.HuberLoss

Examples

See Also