torch.nn.functional.l1_loss
function l1_loss(input: Tensor, target: Tensor, options?: L1LossFunctionalOptions): Tensorfunction l1_loss(input: Tensor, target: Tensor, size_average: boolean | undefined, reduce: boolean | undefined, reduction: 'none' | 'mean' | 'sum', options?: L1LossFunctionalOptions): TensorL1 Loss (Mean Absolute Error): robust regression loss function.
Computes the average absolute difference between predictions and targets. The linear penalty is less sensitive to outliers than MSE, making L1 more robust but less smooth for optimization. Essential for:
- Regression with noisy or outlier-prone data
- Robust machine learning (outliers don't dominate loss)
- Sparse predictions (L1 naturally promotes sparsity)
- Applications where linear penalty matches domain
- Bounding box regression with occasional outlier boxes
When to use L1 Loss:
- Data contains outliers or extreme values
- You want robust estimation (median-like behavior)
- Sparse solutions are desired (L1 promotes zeros)
- When you need linear penalty on errors
- Rarely alone; SmoothL1/Huber often better
Trade-offs vs MSE Loss:
- Robustness: L1 ignores outliers (linear penalty), MSE penalizes heavily (quadratic)
- Optimization: MSE smoother everywhere, L1 has kink at 0 (harder to optimize)
- Gradient: L1 constant magnitude (no explosion), MSE grows with error size
- Sparsity: L1 naturally promotes sparse solutions (many exact zeros)
- Empirical: MSE better if outliers rare; L1 better with outliers
- Robust to outliers: Linear penalty instead of quadratic
- Non-smooth: Has kink at 0 (gradient undefined), but practical optimizers handle it
- Sparse solutions: L1 penalty naturally encourages sparsity (many exact zeros)
- Smaller gradients: Magnitude doesn't grow with error size (vs MSE)
- Median behavior: Minimizing L1 finds median solution (vs mean for MSE)
- Gradient discontinuity: Non-differentiable at 0, but optimizers typically handle it
- Harder optimization: Kink can make some optimizers struggle vs MSE
- Not recommended alone: SmoothL1/Huber often better (combines L1 and MSE benefits)
Parameters
inputTensor- Predicted values (any shape)
targetTensor- Target values (same shape as input)
optionsL1LossFunctionalOptionsoptional- Options for the operation. See
L1LossFunctionalOptions.
Returns
Tensor– Scalar loss value (or tensor if reduction='none')Examples
// Robust regression with outliers
const predictions = torch.tensor([1.0, 2.0, 3.0, 4.0]);
const targets = torch.tensor([1.1, 2.2, 2.8, 100.0]); // Last is outlier
const l1_loss_val = torch.nn.functional.l1_loss(predictions, targets);
// L1 penalizes outlier linearly: |4 - 100| = 96
// More robust than MSE which would penalize quadratically: (4-100)² = 9216// Sparse linear model
const x = torch.randn([100, 50]);
const y = torch.randn([100, 1]);
const weights = torch.randn([50, 1], { requires_grad: true });
const pred = x.matmul(weights);
const loss = torch.nn.functional.l1_loss(pred, y);
// L1 loss naturally pushes many weights to exactly zero (sparsity)// Comparing L1 vs MSE on noisy data
const clean_targets = torch.tensor([1.0, 2.0, 3.0]);
const predictions = torch.tensor([1.1, 2.1, 2.9]);
const l1 = torch.nn.functional.l1_loss(predictions, clean_targets); // Robust
const mse = torch.nn.functional.mse_loss(predictions, clean_targets); // More sensitive
// L1 preferred when data has occasional large outliersSee Also
- PyTorch torch.nn.functional.l1_loss
- mse_loss - Smoother alternative, more sensitive to outliers
- smooth_l1_loss - Hybrid of L1 and MSE (smooth near 0, linear elsewhere)
- huber_loss - Alias for SmoothL1Loss, best of both worlds