torch.nn.functional.smooth_l1_loss

function smooth_l1_loss(input: Tensor, target: Tensor, options?: SmoothL1LossFunctionalOptions): Tensor

function smooth_l1_loss(input: Tensor, target: Tensor, size_average: boolean | undefined, reduce: boolean | undefined, reduction: 'none' | 'mean' | 'sum', beta: number, options?: SmoothL1LossFunctionalOptions): Tensor

Smooth L1 Loss: hybrid regression loss combining benefits of L1 and MSE.

Applies a piecewise loss function that switches between L2 (squared error) for small errors and L1 (absolute error) for large errors. The transition occurs at beta parameter. Combines the smoothness of MSE near zero (better gradients) with robustness of L1 for outliers. Essential for:

Object detection bounding box regression (standard loss)
Regression with occasional outliers (robust to extreme values)
Semantic segmentation and dense prediction tasks
Combining smooth optimization (MSE-like near zero) with outlier robustness (L1-like for large errors)
Tasks where both gradient stability and outlier robustness matter
Computer vision tasks where target values have occasional extreme errors

Why Smooth L1:

MSE has large gradients for large errors (can cause training instability with outliers)
L1 has constant gradients but kink at zero (harder to optimize near zero)
SmoothL1 combines both: L2 near zero (smooth gradients), L1 for large errors (robust)

When to use Smooth L1:

Bounding box regression (Faster R-CNN, YOLO use SmoothL1)
Regression with outliers but need smooth gradients
When you want MSE's smoothness but need outlier robustness
Avoid: Pure regression without outliers (use MSE for faster convergence)
Avoid: Classification (use CrossEntropyLoss instead)

Comparison with alternatives:

MSE: Smooth everywhere but sensitive to outliers (large errors heavily penalized)
L1: Robust to outliers but non-smooth at zero (gradients don't vanish)
SmoothL1: Best of both: smooth near zero, robust for large errors (Goldilocks loss)
Huber Loss: Same as SmoothL1 (renamed version with delta parameter)
Tukey/Bisquare: More aggressive outlier rejection but harder to tune

\begin{aligned} \text{SmoothL1}(x) = \begin{cases} \frac{1}{2\beta}x^2 & \text{if } |x| < \beta \\ |x| - \frac{1}{2}\beta & \text{otherwise} \end{cases} \\ \text{Loss} = \text{SmoothL1}(|p - t|) \text{ where } p \text{ is prediction, } t \text{ is target} \\ \text{Gradient near zero:} \approx x/\beta \text{ (smooth, like MSE)} \\ \text{Gradient for large } |x|: \approx \text{sign}(x) \text{ (constant, like L1)} \end{aligned}

Standard for object detection: Faster R-CNN, YOLO, Mask R-CNN use SmoothL1
Beta tuning: Start with beta=1.0; reduce if too many outliers affect training
Gradient properties: Quadratic gradient near zero enables stable training; linear far from zero prevents explosion
Reduction options: Use 'none' for per-sample loss analysis or sample weighting
Shape preservation: With reduction='none', output shape matches input
Numerical stability: Implementation avoids numerical issues with small beta values
Comparison with Huber: SmoothL1 and HuberLoss are mathematically identical (delta=beta)

Beta parameter critical: Wrong beta can hurt performance; tune for your data
Not for classification: Use CrossEntropyLoss for classification, not SmoothL1
Small beta risk: Very small beta values ( 0.1) can cause numerical issues
Outliers matter: If no outliers in data, MSE might be better choice (faster convergence)

Parameters

inputTensor: Predicted values (any shape)
targetTensor: Target values (same shape as input)
optionsSmoothL1LossFunctionalOptionsoptional

Returns

Tensor– Loss tensor of shape matching input (if reduction='none') or scalar

Examples

// Object detection bounding box regression
const pred_boxes = model.predict_bboxes(image);  // [N, 4] coordinates
const target_boxes = ground_truth_bboxes;         // [N, 4]
const bbox_loss = torch.nn.functional.smooth_l1_loss(pred_boxes, target_boxes);
// SmoothL1 standard for Faster R-CNN, YOLO, RetinaNet

// Regression with occasional outliers
const predictions = torch.randn([1000]);     // Model predictions
const targets = torch.randn([1000]);         // Target values
// Add some outliers
targets.set(torch.randn([10]) * 100);       // Some very large values
const loss = torch.nn.functional.smooth_l1_loss(predictions, targets);
// SmoothL1 less affected than MSE by outliers

// Tuning beta parameter for different robustness levels
const predictions = model(x);
const targets = y;

// Conservative: less robust to outliers but smoother gradients
const loss_conservative = torch.nn.functional.smooth_l1_loss(
  predictions, targets, 'mean', 0.5
);

// Standard: balanced robustness and smoothness
const loss_standard = torch.nn.functional.smooth_l1_loss(
  predictions, targets, 'mean', 1.0
);

// Aggressive: more robust but less smooth
const loss_aggressive = torch.nn.functional.smooth_l1_loss(
  predictions, targets, 'mean', 2.0
);

// Per-element loss for analysis
const pred = torch.tensor([1.5, 2.0, 0.5]);
const target = torch.tensor([1.0, 10.0, 0.4]);
const losses = torch.nn.functional.smooth_l1_loss(pred, target, 'none', 1.0);
// losses = [0.125, 9.5, 0.005] - outlier (10.0) has larger but not extreme penalty

// Semantic segmentation dense prediction
const pred_depth = model.predict_depth(image);  // [H, W] depth predictions
const target_depth = lidar_depth;                // [H, W] ground truth
const depth_loss = torch.nn.functional.smooth_l1_loss(pred_depth, target_depth);
// Good for regression tasks with potential outliers in targets

torch.nn.functional.smooth_l1_loss

Parameters

Returns

Examples

See Also

torch.nn.functional.smooth_l1_loss

Parameters

Returns

Examples

See Also