torch.nn.SmoothL1LossOptions
SmoothL1Loss (Smooth L1): variant of Huber loss with cleaner mathematical form.
Nearly identical to HuberLoss but with slightly different formulation that simplifies to dividing both branches by beta. Common in older code and some architectures (Faster R-CNN). Combines L1 and MSE benefits: robust to outliers while maintaining smooth gradients everywhere. Use Huber for clarity; SmoothL1 mainly for compatibility with existing code.
When to use SmoothL1Loss:
- Older codebases or models (Faster R-CNN-era code often uses SmoothL1)
- When you need HuberLoss properties but prefer different parameterization
- Bounding box regression in object detection (pre-modern implementations)
- Regression with outliers (robust and smooth)
- Drop-in replacement for Huber in existing architectures
SmoothL1 vs Huber:
- Mathematically similar: Both are piecewise L2/L1 hybrid
- Different parameterization: SmoothL1 uses beta, Huber uses delta
- Relationship: SmoothL1(x, beta) ≈ Huber(x, beta) with different scaling
- Huber recommended: Clearer semantics and more standard naming
- Compatibility: Use SmoothL1 if code expects this specific function
Algorithm: For each error e_i = predicted_i - target_i:
- If |e_i| < beta: loss_i = 0.5 * e_i² / beta (quadratic, like MSE)
- If |e_i| ≥ beta: loss_i = |e_i| - 0.5 * beta (linear, like L1)
Note the division by beta in the first case (differs from Huber). Continuous and differentiable everywhere.
Definition
export interface SmoothL1LossOptions {
/** How to reduce the loss ('none' | 'mean' | 'sum', default: 'mean') */
reduction?: Reduction;
/** Threshold for transitioning between L1 and L2 behavior (default: 1.0) */
beta?: number;
}reduction(Reduction)optional- – How to reduce the loss ('none' | 'mean' | 'sum', default: 'mean')
beta(number)optional- – Threshold for transitioning between L1 and L2 behavior (default: 1.0)
Examples
// Object detection with SmoothL1 (older approach)
const smooth_l1 = new torch.nn.SmoothL1Loss('mean', 1.0); // beta = 1.0
const predicted_boxes = torch.randn([32, 4]);
const target_boxes = torch.randn([32, 4]);
const loss = smooth_l1.forward(predicted_boxes, target_boxes);
// SmoothL1 commonly used in Faster R-CNN and similar detectors// Comparing SmoothL1 with HuberLoss
const errors = torch.tensor([0.5, 1.0, 2.0, 5.0]);
const huber = new torch.nn.HuberLoss('sum', 1.0);
const smooth_l1 = new torch.nn.SmoothL1Loss('sum', 1.0);
const huber_loss = huber.forward(errors, torch.zeros_like(errors));
const smooth_l1_loss = smooth_l1.forward(errors, torch.zeros_like(errors));
// Losses are similar but not identical due to different formulations
// Both provide robustness with smooth gradients// Regression network with SmoothL1
class RegressionModel extends torch.nn.Module {
fc1: torch.nn.Linear;
fc2: torch.nn.Linear;
constructor() {
super();
this.fc1 = new torch.nn.Linear(10, 64);
this.fc2 = new torch.nn.Linear(64, 1);
}
forward(x: torch.Tensor): torch.Tensor {
let h = torch.nn.functional.relu(this.fc1.forward(x));
return this.fc2.forward(h);
}
}
const model = new RegressionModel();
const loss_fn = new torch.nn.SmoothL1Loss('mean', 1.0);
const batch_x = torch.randn([32, 10]);
const batch_y = torch.randn([32, 1]);
const predictions = model.forward(batch_x);
const loss = loss_fn.forward(predictions, batch_y);
// SmoothL1 provides robust regression loss// Tuning beta for different robustness levels
const x = torch.randn([100, 1]);
const y = torch.randn([100, 1]);
// More robust (smaller beta)
const robust = new torch.nn.SmoothL1Loss('mean', 0.5);
// Balanced (default)
const balanced = new torch.nn.SmoothL1Loss('mean', 1.0);
// Less robust, smoother (larger beta)
const smooth = new torch.nn.SmoothL1Loss('mean', 2.0);
const loss_robust = robust.forward(x, y);
const loss_balanced = balanced.forward(x, y);
const loss_smooth = smooth.forward(x, y);