torch.nn.functional.smooth_l1_loss
function smooth_l1_loss(input: Tensor, target: Tensor, options?: SmoothL1LossFunctionalOptions): Tensorfunction smooth_l1_loss(input: Tensor, target: Tensor, size_average: boolean | undefined, reduce: boolean | undefined, reduction: 'none' | 'mean' | 'sum', beta: number, options?: SmoothL1LossFunctionalOptions): TensorSmooth L1 Loss: hybrid regression loss combining benefits of L1 and MSE.
Applies a piecewise loss function that switches between L2 (squared error) for small errors and L1 (absolute error) for large errors. The transition occurs at beta parameter. Combines the smoothness of MSE near zero (better gradients) with robustness of L1 for outliers. Essential for:
- Object detection bounding box regression (standard loss)
- Regression with occasional outliers (robust to extreme values)
- Semantic segmentation and dense prediction tasks
- Combining smooth optimization (MSE-like near zero) with outlier robustness (L1-like for large errors)
- Tasks where both gradient stability and outlier robustness matter
- Computer vision tasks where target values have occasional extreme errors
Why Smooth L1:
- MSE has large gradients for large errors (can cause training instability with outliers)
- L1 has constant gradients but kink at zero (harder to optimize near zero)
- SmoothL1 combines both: L2 near zero (smooth gradients), L1 for large errors (robust)
When to use Smooth L1:
- Bounding box regression (Faster R-CNN, YOLO use SmoothL1)
- Regression with outliers but need smooth gradients
- When you want MSE's smoothness but need outlier robustness
- Avoid: Pure regression without outliers (use MSE for faster convergence)
- Avoid: Classification (use CrossEntropyLoss instead)
Comparison with alternatives:
- MSE: Smooth everywhere but sensitive to outliers (large errors heavily penalized)
- L1: Robust to outliers but non-smooth at zero (gradients don't vanish)
- SmoothL1: Best of both: smooth near zero, robust for large errors (Goldilocks loss)
- Huber Loss: Same as SmoothL1 (renamed version with delta parameter)
- Tukey/Bisquare: More aggressive outlier rejection but harder to tune
- Standard for object detection: Faster R-CNN, YOLO, Mask R-CNN use SmoothL1
- Beta tuning: Start with beta=1.0; reduce if too many outliers affect training
- Gradient properties: Quadratic gradient near zero enables stable training; linear far from zero prevents explosion
- Reduction options: Use 'none' for per-sample loss analysis or sample weighting
- Shape preservation: With reduction='none', output shape matches input
- Numerical stability: Implementation avoids numerical issues with small beta values
- Comparison with Huber: SmoothL1 and HuberLoss are mathematically identical (delta=beta)
- Beta parameter critical: Wrong beta can hurt performance; tune for your data
- Not for classification: Use CrossEntropyLoss for classification, not SmoothL1
- Small beta risk: Very small beta values ( 0.1) can cause numerical issues
- Outliers matter: If no outliers in data, MSE might be better choice (faster convergence)
Parameters
inputTensor- Predicted values (any shape)
targetTensor- Target values (same shape as input)
optionsSmoothL1LossFunctionalOptionsoptional
Returns
Tensor– Loss tensor of shape matching input (if reduction='none') or scalarExamples
// Object detection bounding box regression
const pred_boxes = model.predict_bboxes(image); // [N, 4] coordinates
const target_boxes = ground_truth_bboxes; // [N, 4]
const bbox_loss = torch.nn.functional.smooth_l1_loss(pred_boxes, target_boxes);
// SmoothL1 standard for Faster R-CNN, YOLO, RetinaNet// Regression with occasional outliers
const predictions = torch.randn([1000]); // Model predictions
const targets = torch.randn([1000]); // Target values
// Add some outliers
targets.set(torch.randn([10]) * 100); // Some very large values
const loss = torch.nn.functional.smooth_l1_loss(predictions, targets);
// SmoothL1 less affected than MSE by outliers// Tuning beta parameter for different robustness levels
const predictions = model(x);
const targets = y;
// Conservative: less robust to outliers but smoother gradients
const loss_conservative = torch.nn.functional.smooth_l1_loss(
predictions, targets, 'mean', 0.5
);
// Standard: balanced robustness and smoothness
const loss_standard = torch.nn.functional.smooth_l1_loss(
predictions, targets, 'mean', 1.0
);
// Aggressive: more robust but less smooth
const loss_aggressive = torch.nn.functional.smooth_l1_loss(
predictions, targets, 'mean', 2.0
);// Per-element loss for analysis
const pred = torch.tensor([1.5, 2.0, 0.5]);
const target = torch.tensor([1.0, 10.0, 0.4]);
const losses = torch.nn.functional.smooth_l1_loss(pred, target, 'none', 1.0);
// losses = [0.125, 9.5, 0.005] - outlier (10.0) has larger but not extreme penalty// Semantic segmentation dense prediction
const pred_depth = model.predict_depth(image); // [H, W] depth predictions
const target_depth = lidar_depth; // [H, W] ground truth
const depth_loss = torch.nn.functional.smooth_l1_loss(pred_depth, target_depth);
// Good for regression tasks with potential outliers in targetsSee Also
- PyTorch torch.nn.functional.smooth_l1_loss
- mse_loss - Smooth everywhere but sensitive to outliers
- l1_loss - Robust but non-smooth at zero
- huber_loss - Identical to SmoothL1 (different parameter name)
- torch.nn.functional.smooth_l1_loss - PyTorch reference