torch.nn.functional.huber_loss

function huber_loss(input: Tensor, target: Tensor, options?: HuberLossFunctionalOptions): Tensor

function huber_loss(input: Tensor, target: Tensor, reduction: 'none' | 'mean' | 'sum', delta: number, options?: HuberLossFunctionalOptions): Tensor

Huber loss: robust regression loss with smooth L2 near zero, linear for outliers.

Also known as Smooth Mean Absolute Error (SMAE), Huber loss combines L2 (squared error) and L1 (absolute error) losses. It's identical to smooth_l1_loss mathematically but uses the delta parameter instead of beta. Applies quadratic loss for small errors (smooth gradients) and linear loss for large errors (outlier robustness). Ideal for regression problems where you want both training stability (smooth gradients) and robustness to outliers (bounded loss).

Common use cases:

Robust regression: Regression with occasional outliers that shouldn't dominate training
Object detection: Bounding box and keypoint regression (identical to SmoothL1Loss)
Depth estimation: Dense prediction where some targets may have measurement noise
Pose estimation: Joint coordinate regression where outliers can occur
Dense prediction tasks: Any per-pixel or per-region regression with potential outliers
Time series forecasting: Robust to occasional extreme values or measurement errors
Semantic segmentation: Auxiliary regression losses (boundary smoothness, depth)

Why Huber Loss:

MSE is smooth everywhere but extremely large gradients for large errors (training instability)
L1 has constant gradients but sharp kink at zero (difficult optimization near zero)
Huber: Combines benefits - smooth near zero (like MSE), bounded gradient for large errors (like L1)
Delta controls the balance: smaller delta = more L1-like (robust), larger delta = more MSE-like (smooth)

Comparison of regression losses:

MSE: Smooth, large errors heavily penalized, sensitive to outliers
L1: Robust to outliers, non-smooth at zero, harder to optimize
Huber: Balanced approach, smooth near zero, robust for large errors (best of both)
SmoothL1: Mathematically identical to Huber (just different parameter naming convention)
Log-Cosh: Smooth everywhere, similar properties but different shape

Relationship to SmoothL1:

Huber loss and SmoothL1 loss are mathematically identical
SmoothL1 uses 'beta' parameter; Huber uses 'delta' parameter
When delta = beta, the functions produce identical results
Choose based on naming preference; both implementations are numerically equivalent

\begin{aligned} \text{Huber}(x, \delta) = \begin{cases} \frac{1}{2}x^2 & \text{if } |x| \leq \delta \\ \delta(|x| - \frac{\delta}{2}) & \text{if } |x| > \delta \end{cases} \\ \text{Loss} = \text{Huber}(p - t, \delta) \text{ where } p \text{ is prediction, } t \text{ is target} \\ \text{Gradient for } |x| \leq \delta: \approx x \text{ (linear, like MSE)} \\ \text{Gradient for } |x| > \delta: \approx \delta \cdot \text{sign}(x) \text{ (constant, like L1)} \end{aligned}

Identical to SmoothL1: Huber loss and smooth_l1_loss are mathematically identical
Delta is key parameter: Choose delta carefully - it controls outlier robustness vs smoothness
Default delta=1.0: Standard choice for most regression tasks; start here and tune if needed
Smaller delta = more robust: If data has many outliers, try delta 1.0
Larger delta = smoother: If no outliers expected, try delta 1.0 for smoother training
Gradient properties: Quadratic gradient near zero (stable), linear for large errors (bounded)
Reduction options: Use 'none' for per-sample loss analysis or sample weighting
Shape handling: Works with any tensor shape (scalars, vectors, matrices, higher dimensions)

Delta parameter critical: Wrong delta significantly affects performance; needs tuning
Not for classification: Use CrossEntropyLoss or BCELoss for classification tasks
Positive delta required: delta must be 0; negative or zero values are invalid
Same shapes required: Input and target must have identical shapes; mismatch throws error
Data scaling matters: If input values are very large/small, may need to rescale or adjust delta
Comparison with MSE: On outlier-free data, MSE often converges faster; use MSE if no outliers

Parameters

inputTensor: Predicted values (any shape and dtype). Can be multi-dimensional
targetTensor: Target values (must have same shape as input)
optionsHuberLossFunctionalOptionsoptional

Returns

Tensor– Loss tensor: - If reduction='none': Tensor with same shape as input - If reduction='mean' or 'sum': Scalar tensor

Examples

// Basic regression with outliers
const predictions = torch.randn([100]);
const targets = torch.randn([100]);
// Add some outliers
targets.set([torch.randn([5]) * 50]);  // Some extreme values
const loss = torch.nn.functional.huber_loss(predictions, targets);
// Huber loss is less affected by outliers than MSE

// Object detection bounding box regression (identical to SmoothL1)
const pred_boxes = torch.randn(64, 4);   // Predicted [x, y, w, h]
const target_boxes = torch.randn(64, 4); // Ground truth
const bbox_loss = torch.nn.functional.huber_loss(
  pred_boxes, target_boxes, 'mean', 1.0
);
// delta=1.0 is standard for object detection (same as SmoothL1)

// Tuning delta for different robustness levels
const predictions = model.forward(x);
const targets = y;

// Conservative delta: less robust to outliers, smoother everywhere
const loss_smooth = torch.nn.functional.huber_loss(
  predictions, targets, 'mean', 0.5
);

// Standard delta: balanced robustness and smoothness
const loss_standard = torch.nn.functional.huber_loss(
  predictions, targets, 'mean', 1.0
);

// Aggressive delta: more robust to outliers, less smooth
const loss_robust = torch.nn.functional.huber_loss(
  predictions, targets, 'mean', 2.0
);

// Per-element loss analysis
const pred = torch.tensor([1.5, 2.0, 0.5, 5.0]);
const target = torch.tensor([1.0, 10.0, 0.4, 1.0]);
const losses = torch.nn.functional.huber_loss(pred, target, 'none', 1.0);
// losses for each element showing how delta handles different error magnitudes

// Depth map regression in computer vision
const pred_depth = model.forward(image);  // [H, W] depth predictions
const target_depth = lidar_ground_truth;  // [H, W] ground truth
const depth_loss = torch.nn.functional.huber_loss(
  pred_depth, target_depth, 'mean', 1.0
);
// Robust to measurement noise while maintaining training stability

// Keypoint regression with weighted loss
const pred_keypoints = torch.randn(32, 10, 2);  // Batch, 10 keypoints, (x,y)
const target_keypoints = torch.randn(32, 10, 2);
const kp_loss = torch.nn.functional.huber_loss(
  pred_keypoints, target_keypoints, 'none', 0.5
);
// Use 'none' to apply confidence-based weighting per keypoint
const confidence = torch.randn(32, 10);
const weighted_loss = (kp_loss * confidence).mean();

// Comparison with MSE and L1 losses
const pred = torch.randn([1000]);
const target = torch.randn([1000]);
const outlier_mask = torch.randn([1000]).abs().gt(3);

const mse = torch.nn.functional.mse_loss(pred, target);
const l1 = torch.nn.functional.l1_loss(pred, target);
const huber = torch.nn.functional.huber_loss(pred, target, 'mean', 1.0);
// Huber typically falls between MSE and L1

torch.nn.functional.huber_loss

Parameters

Returns

Examples

See Also

torch.nn.functional.huber_loss

Parameters

Returns

Examples

See Also