torch.nn.functional.huber_loss
function huber_loss(input: Tensor, target: Tensor, options?: HuberLossFunctionalOptions): Tensorfunction huber_loss(input: Tensor, target: Tensor, reduction: 'none' | 'mean' | 'sum', delta: number, options?: HuberLossFunctionalOptions): TensorHuber loss: robust regression loss with smooth L2 near zero, linear for outliers.
Also known as Smooth Mean Absolute Error (SMAE), Huber loss combines L2 (squared error) and L1 (absolute error) losses. It's identical to smooth_l1_loss mathematically but uses the delta parameter instead of beta. Applies quadratic loss for small errors (smooth gradients) and linear loss for large errors (outlier robustness). Ideal for regression problems where you want both training stability (smooth gradients) and robustness to outliers (bounded loss).
Common use cases:
- Robust regression: Regression with occasional outliers that shouldn't dominate training
- Object detection: Bounding box and keypoint regression (identical to SmoothL1Loss)
- Depth estimation: Dense prediction where some targets may have measurement noise
- Pose estimation: Joint coordinate regression where outliers can occur
- Dense prediction tasks: Any per-pixel or per-region regression with potential outliers
- Time series forecasting: Robust to occasional extreme values or measurement errors
- Semantic segmentation: Auxiliary regression losses (boundary smoothness, depth)
Why Huber Loss:
- MSE is smooth everywhere but extremely large gradients for large errors (training instability)
- L1 has constant gradients but sharp kink at zero (difficult optimization near zero)
- Huber: Combines benefits - smooth near zero (like MSE), bounded gradient for large errors (like L1)
- Delta controls the balance: smaller delta = more L1-like (robust), larger delta = more MSE-like (smooth)
Comparison of regression losses:
- MSE: Smooth, large errors heavily penalized, sensitive to outliers
- L1: Robust to outliers, non-smooth at zero, harder to optimize
- Huber: Balanced approach, smooth near zero, robust for large errors (best of both)
- SmoothL1: Mathematically identical to Huber (just different parameter naming convention)
- Log-Cosh: Smooth everywhere, similar properties but different shape
Relationship to SmoothL1:
- Huber loss and SmoothL1 loss are mathematically identical
- SmoothL1 uses 'beta' parameter; Huber uses 'delta' parameter
- When delta = beta, the functions produce identical results
- Choose based on naming preference; both implementations are numerically equivalent
- Identical to SmoothL1: Huber loss and smooth_l1_loss are mathematically identical
- Delta is key parameter: Choose delta carefully - it controls outlier robustness vs smoothness
- Default delta=1.0: Standard choice for most regression tasks; start here and tune if needed
- Smaller delta = more robust: If data has many outliers, try delta 1.0
- Larger delta = smoother: If no outliers expected, try delta 1.0 for smoother training
- Gradient properties: Quadratic gradient near zero (stable), linear for large errors (bounded)
- Reduction options: Use 'none' for per-sample loss analysis or sample weighting
- Shape handling: Works with any tensor shape (scalars, vectors, matrices, higher dimensions)
- Delta parameter critical: Wrong delta significantly affects performance; needs tuning
- Not for classification: Use CrossEntropyLoss or BCELoss for classification tasks
- Positive delta required: delta must be 0; negative or zero values are invalid
- Same shapes required: Input and target must have identical shapes; mismatch throws error
- Data scaling matters: If input values are very large/small, may need to rescale or adjust delta
- Comparison with MSE: On outlier-free data, MSE often converges faster; use MSE if no outliers
Parameters
inputTensor- Predicted values (any shape and dtype). Can be multi-dimensional
targetTensor- Target values (must have same shape as input)
optionsHuberLossFunctionalOptionsoptional
Returns
Tensor– Loss tensor: - If reduction='none': Tensor with same shape as input - If reduction='mean' or 'sum': Scalar tensorExamples
// Basic regression with outliers
const predictions = torch.randn([100]);
const targets = torch.randn([100]);
// Add some outliers
targets.set([torch.randn([5]) * 50]); // Some extreme values
const loss = torch.nn.functional.huber_loss(predictions, targets);
// Huber loss is less affected by outliers than MSE// Object detection bounding box regression (identical to SmoothL1)
const pred_boxes = torch.randn(64, 4); // Predicted [x, y, w, h]
const target_boxes = torch.randn(64, 4); // Ground truth
const bbox_loss = torch.nn.functional.huber_loss(
pred_boxes, target_boxes, 'mean', 1.0
);
// delta=1.0 is standard for object detection (same as SmoothL1)// Tuning delta for different robustness levels
const predictions = model.forward(x);
const targets = y;
// Conservative delta: less robust to outliers, smoother everywhere
const loss_smooth = torch.nn.functional.huber_loss(
predictions, targets, 'mean', 0.5
);
// Standard delta: balanced robustness and smoothness
const loss_standard = torch.nn.functional.huber_loss(
predictions, targets, 'mean', 1.0
);
// Aggressive delta: more robust to outliers, less smooth
const loss_robust = torch.nn.functional.huber_loss(
predictions, targets, 'mean', 2.0
);// Per-element loss analysis
const pred = torch.tensor([1.5, 2.0, 0.5, 5.0]);
const target = torch.tensor([1.0, 10.0, 0.4, 1.0]);
const losses = torch.nn.functional.huber_loss(pred, target, 'none', 1.0);
// losses for each element showing how delta handles different error magnitudes// Depth map regression in computer vision
const pred_depth = model.forward(image); // [H, W] depth predictions
const target_depth = lidar_ground_truth; // [H, W] ground truth
const depth_loss = torch.nn.functional.huber_loss(
pred_depth, target_depth, 'mean', 1.0
);
// Robust to measurement noise while maintaining training stability// Keypoint regression with weighted loss
const pred_keypoints = torch.randn(32, 10, 2); // Batch, 10 keypoints, (x,y)
const target_keypoints = torch.randn(32, 10, 2);
const kp_loss = torch.nn.functional.huber_loss(
pred_keypoints, target_keypoints, 'none', 0.5
);
// Use 'none' to apply confidence-based weighting per keypoint
const confidence = torch.randn(32, 10);
const weighted_loss = (kp_loss * confidence).mean();// Comparison with MSE and L1 losses
const pred = torch.randn([1000]);
const target = torch.randn([1000]);
const outlier_mask = torch.randn([1000]).abs().gt(3);
const mse = torch.nn.functional.mse_loss(pred, target);
const l1 = torch.nn.functional.l1_loss(pred, target);
const huber = torch.nn.functional.huber_loss(pred, target, 'mean', 1.0);
// Huber typically falls between MSE and L1See Also
- PyTorch torch.nn.functional.huber_loss
- smooth_l1_loss - Mathematically identical loss (beta parameter instead of delta)
- mse_loss - Smooth but sensitive to outliers
- l1_loss - Robust but non-smooth at zero
- log_cosh_loss - Alternative smooth robust loss function
- torch.nn.functional.huber_loss - PyTorch reference