torch.nn.L1LossOptions

L1Loss (Mean Absolute Error): regression loss function.

Measures the average absolute difference between predicted and target values. Less sensitive to outliers than MSELoss (uses L1 norm instead of L2), making it more robust for data with extreme values. Used for regression tasks where predictions should stay close to targets without being too harshly penalized for occasional large errors.

When to use L1Loss:

Regression with noisy data or outliers (more robust than MSE)
Pixel-level predictions (depth, segmentation) with variable difficulty
Bounding box regression with some outlier bboxes
When you want to minimize average error rather than squared error
Sparse predictions (L1 naturally promotes sparsity)

Trade-offs vs MSELoss:

Robustness: L1 ignores outliers better (linear vs quadratic penalty)
Smoothness: MSE is smoother (differentiable everywhere); L1 has kink at 0
Computation: MSE slightly faster (multiply vs absolute value)
Gradient behavior: L1 has constant gradient magnitude (1 or -1) vs MSE's variable gradient
Optimization: L1 can be harder to optimize due to kink, but more stable with outliers

Trade-offs vs SmoothL1Loss (Huber):

SmoothL1 hybrid: Combines L1's robustness with MSE's smoothness (smooth near 0, linear elsewhere)
SmoothL1 better: For most practical applications, SmoothL1 is often superior
L1 is simpler: No hyperparameter tuning like SmoothL1's beta
Empirical: SmoothL1 usually gives better results; L1 mainly for theoretical/sparse work

Algorithm: Forward: loss_i = |predicted_i - target_i| Backward: ∂loss/∂predicted = sign(predicted - target) (constant magnitude, direction-dependent) Note: Gradient is undefined at 0 in theory, but typically treated as 0 or ±1 in practice.

Definition

export interface L1LossOptions {
  /** How to reduce loss across batch (default: 'mean') */
  reduction?: Reduction;
}

reduction(Reduction)optional: – How to reduce loss across batch (default: 'mean')

Examples

// Simple L1 regression
const l1_loss = new torch.nn.L1Loss();

const predictions = torch.tensor([1.0, 2.0, 3.0, 4.0]);
const targets = torch.tensor([1.1, 2.2, 2.8, 4.1]);

const loss = l1_loss.forward(predictions, targets);
// loss = mean(|1.0-1.1|, |2.0-2.2|, |3.0-2.8|, |4.0-4.1|)
//      = mean([0.1, 0.2, 0.2, 0.1]) = 0.15

// Robust regression with outliers
class RobustRegressor extends torch.nn.Module {
  fc1: torch.nn.Linear;
  fc2: torch.nn.Linear;

  constructor(input_dim: number) {
    super();
    this.fc1 = new torch.nn.Linear(input_dim, 64);
    this.fc2 = new torch.nn.Linear(64, 1);
  }

  forward(x: torch.Tensor): torch.Tensor {
    x = this.fc1.forward(x);
    x = torch.nn.functional.relu(x);
    return this.fc2.forward(x);
  }
}

const model = new RobustRegressor(10);
const l1_loss = new torch.nn.L1Loss();

const batch_x = torch.randn([32, 10]);
const batch_y = torch.randn([32, 1]);
const predictions = model.forward(batch_x);
const loss = l1_loss.forward(predictions, batch_y);
// L1 is more robust to outliers in batch_y than MSE

// Comparing L1 vs MSE on noisy data
const targets = torch.tensor([1.0, 2.0, 100.0]);  // Last is outlier
const predictions = torch.tensor([1.1, 2.1, 50.0]);

const mse = new torch.nn.MSELoss();
const l1 = new torch.nn.L1Loss();

const mse_loss = mse.forward(predictions, targets);
// MSE heavily penalizes the outlier: (50-100)² = 2500
const l1_loss = l1.forward(predictions, targets);
// L1 penalizes linearly: |50-100| = 50
// L1 is less affected by extreme outliers