torch.nn.GaussianNLLLoss

class GaussianNLLLoss extends Module

new GaussianNLLLoss(options?: { full?: boolean; eps?: number; reduction?: Reduction })

readonlyfull(boolean)
readonlyeps(number)
readonlyreduction(Reduction)

Gaussian Negative Log Likelihood (NLL) Loss: probabilistic regression with uncertainty estimation.

Computes negative log likelihood assuming a Gaussian distribution, enabling models to predict both the mean and variance of outputs. Essential for:

Uncertainty quantification in regression (aleatoric uncertainty)
Heteroscedastic regression (where noise varies with input)
Bayesian deep learning and probabilistic models
Confidence estimation in predictions
Learning to calibrate prediction confidence
Physics-informed neural networks with measurement noise

Unlike MSE loss which assumes fixed variance, Gaussian NLL allows the model to learn both the predicted value and its confidence. The model outputs mean μ and variance σ², and minimizes NLL = 0.5 * (log(σ²) + (y - μ)² / σ²). High variance near targets allows the model to "give up" on noisy regions; low variance enforces accurate predictions in clean regions.

When to use Gaussian NLL Loss:

Regression with uncertain/noisy labels (learning noise levels)
Heteroscedastic noise (different noise levels for different inputs)
Uncertainty quantification (how confident is this prediction?)
Robust regression (let model learn which samples are harder)
Calibration (predicting both mean and confidence interval)
Active learning (use uncertainty to select hard examples)
Any probabilistic regression task

Trade-offs:

vs MSE Loss: MSE assumes fixed noise; Gaussian NLL learns noise per-sample
More parameters: Model must predict both mean and variance (2× outputs)
Stability: Can be unstable if variance becomes too small (use eps parameter)
Interpretability: Variance has clear probabilistic interpretation (uncertainty)
Computational cost: Slightly higher than MSE (log computation)

Algorithm: For each prediction (mean, variance) pair targeting y:

NLL_i = 0.5 * (log(σ_i²) + (y_i - μ_i)² / σ_i²)

With full=True (includes constant 0.5*log(2π)):

NLL_i = 0.5 * (log(2π) + log(σ_i²) + (y_i - μ_i)² / σ_i²)

The variance parameter is clipped to [eps, ∞) for numerical stability.

\begin{aligned} \text{NLL}_i = \frac{1}{2} \left(\log(\sigma_i^2) + \frac{(y_i - \mu_i)^2}{\sigma_i^2}\right) \quad \text{(full=false)} \\ \text{NLL}_i = \frac{1}{2} \left(\log(2\pi) + \log(\sigma_i^2) + \frac{(y_i - \mu_i)^2}{\sigma_i^2}\right) \quad \text{(full=true)} \end{aligned}

Variance interpretation: Predicted variance σ² is the model's estimate of prediction uncertainty
Output constraint: Variance must be positive. Use softplus or exp to ensure positivity
Numerical stability: Use eps parameter (default 1e-6) to prevent log(0) and division by zero
Log-variance trick: For training stability, often predict log(σ²) and exponentiate
Full parameter: Set full=true for exact MLE; full=false when comparing with other losses
Aleatoric uncertainty: This loss models aleatoric (data) uncertainty, not epistemic (model) uncertainty
Mean field assumption: Assumes each output dimension has independent Gaussian noise
Gradient behavior: Larger variance → weaker gradients; variance trades off with accuracy

Variance must be positive! Use torch.exp() or torch.softplus() on raw outputs
Be careful with eps parameter: too large prevents learning variance, too small causes instability
Target and input must have the same shape
Variance inputs are not clamped; use torch.clamp() if needed to prevent issues

Examples

// Uncertainty quantification in regression
class UncertaintyNet extends torch.nn.Module {
  linear1: torch.nn.Linear;
  mean_head: torch.nn.Linear;
  var_head: torch.nn.Linear;  // Outputs variance σ²

  constructor(input_dim: number, output_dim: number) {
    super();
    this.linear1 = new torch.nn.Linear(input_dim, 64);
    this.mean_head = new torch.nn.Linear(64, output_dim);
    this.var_head = new torch.nn.Linear(64, output_dim);
  }

  forward(x: torch.Tensor): { mean: torch.Tensor; var: torch.Tensor } {
    const h = torch.relu(this.linear1.forward(x));
    const mean = this.mean_head.forward(h);
    const var_ = torch.exp(this.var_head.forward(h));  // Ensure positive variance
    return { mean, var: var_ };
  }
}

const model = new UncertaintyNet(10, 1);
const loss_fn = new torch.nn.GaussianNLLLoss({ eps: 1e-6 });

const output = model.forward(inputs);
const loss = loss_fn.forward(output.mean, targets, output.var);
// Model learns to predict both mean and uncertainty

// Heteroscedastic regression: different noise for different regions
const predictions = torch.randn([32, 1]);  // Predicted mean
const targets = torch.randn([32, 1]);       // Ground truth
const variances = torch.softplus(torch.randn([32, 1])) + 1e-6;  // Predicted variance

const gaussian_nll = new torch.nn.GaussianNLLLoss({ full: false });
const loss = gaussian_nll.forward(predictions, targets, variances);
// Model learns to increase variance in noisy regions

// Calibration: model predicts both value and confidence
const batch_size = 16;
const features = 8;

// Model output: means and log-variances (for stability)
const means = torch.randn([batch_size, features]);
const log_vars = torch.randn([batch_size, features]);
const variances = torch.exp(log_vars);  // Ensure positive

const targets = torch.randn([batch_size, features]);
const loss_fn = new torch.nn.GaussianNLLLoss({ eps: 1e-6, full: true });
const loss = loss_fn.forward(means, targets, variances);

// Well-calibrated model: predictions with high confidence are accurate,
// low-confidence predictions allowed to be less accurate

torch.nn.GaussianNLLLoss

Examples

See Also

torch.nn.GaussianNLLLoss

Examples

See Also