torch.nn.GaussianNLLLoss
class GaussianNLLLoss extends Modulenew GaussianNLLLoss(options?: { full?: boolean; eps?: number; reduction?: Reduction })
- readonly
full(boolean) - readonly
eps(number) - readonly
reduction(Reduction)
Gaussian Negative Log Likelihood (NLL) Loss: probabilistic regression with uncertainty estimation.
Computes negative log likelihood assuming a Gaussian distribution, enabling models to predict both the mean and variance of outputs. Essential for:
- Uncertainty quantification in regression (aleatoric uncertainty)
- Heteroscedastic regression (where noise varies with input)
- Bayesian deep learning and probabilistic models
- Confidence estimation in predictions
- Learning to calibrate prediction confidence
- Physics-informed neural networks with measurement noise
Unlike MSE loss which assumes fixed variance, Gaussian NLL allows the model to learn both the predicted value and its confidence. The model outputs mean μ and variance σ², and minimizes NLL = 0.5 * (log(σ²) + (y - μ)² / σ²). High variance near targets allows the model to "give up" on noisy regions; low variance enforces accurate predictions in clean regions.
When to use Gaussian NLL Loss:
- Regression with uncertain/noisy labels (learning noise levels)
- Heteroscedastic noise (different noise levels for different inputs)
- Uncertainty quantification (how confident is this prediction?)
- Robust regression (let model learn which samples are harder)
- Calibration (predicting both mean and confidence interval)
- Active learning (use uncertainty to select hard examples)
- Any probabilistic regression task
Trade-offs:
- vs MSE Loss: MSE assumes fixed noise; Gaussian NLL learns noise per-sample
- More parameters: Model must predict both mean and variance (2× outputs)
- Stability: Can be unstable if variance becomes too small (use eps parameter)
- Interpretability: Variance has clear probabilistic interpretation (uncertainty)
- Computational cost: Slightly higher than MSE (log computation)
Algorithm: For each prediction (mean, variance) pair targeting y:
- NLL_i = 0.5 * (log(σ_i²) + (y_i - μ_i)² / σ_i²)
With full=True (includes constant 0.5*log(2π)):
- NLL_i = 0.5 * (log(2π) + log(σ_i²) + (y_i - μ_i)² / σ_i²)
The variance parameter is clipped to [eps, ∞) for numerical stability.
- Variance interpretation: Predicted variance σ² is the model's estimate of prediction uncertainty
- Output constraint: Variance must be positive. Use softplus or exp to ensure positivity
- Numerical stability: Use eps parameter (default 1e-6) to prevent log(0) and division by zero
- Log-variance trick: For training stability, often predict log(σ²) and exponentiate
- Full parameter: Set full=true for exact MLE; full=false when comparing with other losses
- Aleatoric uncertainty: This loss models aleatoric (data) uncertainty, not epistemic (model) uncertainty
- Mean field assumption: Assumes each output dimension has independent Gaussian noise
- Gradient behavior: Larger variance → weaker gradients; variance trades off with accuracy
- Variance must be positive! Use torch.exp() or torch.softplus() on raw outputs
- Be careful with eps parameter: too large prevents learning variance, too small causes instability
- Target and input must have the same shape
- Variance inputs are not clamped; use torch.clamp() if needed to prevent issues
Examples
// Uncertainty quantification in regression
class UncertaintyNet extends torch.nn.Module {
linear1: torch.nn.Linear;
mean_head: torch.nn.Linear;
var_head: torch.nn.Linear; // Outputs variance σ²
constructor(input_dim: number, output_dim: number) {
super();
this.linear1 = new torch.nn.Linear(input_dim, 64);
this.mean_head = new torch.nn.Linear(64, output_dim);
this.var_head = new torch.nn.Linear(64, output_dim);
}
forward(x: torch.Tensor): { mean: torch.Tensor; var: torch.Tensor } {
const h = torch.relu(this.linear1.forward(x));
const mean = this.mean_head.forward(h);
const var_ = torch.exp(this.var_head.forward(h)); // Ensure positive variance
return { mean, var: var_ };
}
}
const model = new UncertaintyNet(10, 1);
const loss_fn = new torch.nn.GaussianNLLLoss({ eps: 1e-6 });
const output = model.forward(inputs);
const loss = loss_fn.forward(output.mean, targets, output.var);
// Model learns to predict both mean and uncertainty// Heteroscedastic regression: different noise for different regions
const predictions = torch.randn([32, 1]); // Predicted mean
const targets = torch.randn([32, 1]); // Ground truth
const variances = torch.softplus(torch.randn([32, 1])) + 1e-6; // Predicted variance
const gaussian_nll = new torch.nn.GaussianNLLLoss({ full: false });
const loss = gaussian_nll.forward(predictions, targets, variances);
// Model learns to increase variance in noisy regions// Calibration: model predicts both value and confidence
const batch_size = 16;
const features = 8;
// Model output: means and log-variances (for stability)
const means = torch.randn([batch_size, features]);
const log_vars = torch.randn([batch_size, features]);
const variances = torch.exp(log_vars); // Ensure positive
const targets = torch.randn([batch_size, features]);
const loss_fn = new torch.nn.GaussianNLLLoss({ eps: 1e-6, full: true });
const loss = loss_fn.forward(means, targets, variances);
// Well-calibrated model: predictions with high confidence are accurate,
// low-confidence predictions allowed to be less accurate