torch.nn.functional.gaussian_nll_loss

function gaussian_nll_loss(input: Tensor, target: Tensor, var_: Tensor, options?: {
full?: boolean;
eps?: number;
reduction?: 'none' | 'mean' | 'sum';
}): Tensor

Gaussian (normal distribution) negative log likelihood loss for continuous predictions.

Measures negative log likelihood under Gaussian distribution assumption. Predicts both the mean and variance of target distribution, enabling uncertainty quantification and heteroscedastic regression. Computes -log P(target | predicted_mean, predicted_variance) to measure prediction quality. Essential for:

Heteroscedastic regression (predict mean and uncertainty together)
Uncertainty quantification in neural networks (confidence in predictions)
Aleatoric uncertainty (data noise) modeling vs epistemic (model) uncertainty
Generative models (VAE, diffusion models) that predict variance
Bayesian neural networks (variational inference with Gaussian posteriors)
Robust regression that adapts to varying noise levels across samples
Time-series forecasting with confidence intervals/prediction bands
Probabilistic regression (predict distribution, not just point estimate)

Gaussian likelihood interpretation: Assumes target follows N(μ, σ²) with predicted mean μ and variance σ². Negative log likelihood: -log P(y | μ, σ²) = 0.5*[log(σ²) + (y-μ)²/σ²] Combines two terms:

log(σ²): rewards confident predictions (low variance)
(y-μ)²/σ²: penalizes errors relative to variance (heteroscedastic loss)

Heteroscedasticity and adaptive weighting: Network learns to increase variance σ² for hard-to-predict samples. Reduces loss contribution from high-variance regions (naturally downweights outliers). Variance = 1 → standard MSE; Variance > 1 → downweight error; Variance < 1 → upweight. Enables automatic importance weighting without manual specification.

Aleatoric vs epistemic uncertainty: Aleatoric (data noise): captured by variance σ² per sample Epistemic (model): captured by ensemble/MC-dropout disagreement (separate) Together: total uncertainty = aleatoric + epistemic (for robust prediction)

Mathematical form: Loss = 0.5 * [log(variance) + (prediction - target)² / variance] When full=False: simplified form (sufficient for training) When full=True: adds constant 0.5 * log(2π) (exact Gaussian NLL)

\begin{aligned} L = 0.5 \left[ \log(\sigma^2) + \frac{(x - y)^2}{\sigma^2} \right] \\ L_{\text{full}} = L + 0.5 \log(2\pi) \\ \sigma^2 = \max(\text{var}, \epsilon) \quad \text{(clamped for stability)} \end{aligned}

Variance must be positive: σ² 0; use exp/softplus to ensure positivity
Heteroscedastic weighting: Higher variance → lower loss contribution (automatic importance weighting)
Aleatoric uncertainty: Variance represents data noise, sample-specific
Distributed output: Can distribute network output for mean and variance
Log-variance stability: Use log(σ²) output then exponentiate (more stable)
Full parameter: full=False is usually sufficient; full=True rarely changes optimization
No assumption on target: Target can be any continuous values; doesn't require normality
Gradient stability: Variance clamping (eps) prevents log(0) and division issues

Positive variance required: Will error if variance ≤ 0 (clamps to eps)
Output layer: Must ensure σ² 0 (use exp, softplus, relu+eps, squared output)
Variance exploitation: Unconstrained variance → model learns to predict huge variance
Variance regularization: May need to regularize variance (prevent collapse)
Distribution assumption: Assumes Gaussian; if non-Gaussian, use other losses
Sample efficiency: Fitting variance requires more data than fitting mean alone

Parameters

inputTensor: Predicted mean μ of Gaussian distribution Shape [...] (any dimensions), represents E[target] under predicted distribution Example: [batch, output_dim] from final regression layer
targetTensor: Target values (observations) from Gaussian distribution Shape must match input; continuous values (unbounded) Example: actual continuous targets, regression labels
var_Tensor: Predicted variance σ² of Gaussian distribution Shape must match input; values 0 (variance is positive) Network must ensure positivity (e.g., exp(logvar), softplus, relu) Example: variance predictions from separate output head
options{ full?: boolean; eps?: number; reduction?: 'none' | 'mean' | 'sum'; }optional: Optional configuration: - full: Include constant term in NLL (default: false) - true: exact NLL = 0.5*[log(var) + (pred-target)²/var + log(2π)] - false: simplified = 0.5*[log(var) + (pred-target)²/var] (usually sufficient) - eps: Numerical stability floor for variance (default: 1e-6) - Clamps variance to max(variance, eps) to prevent log(0) and division by 0 - reduction: How to aggregate losses (default: 'mean') - 'none': per-element losses [...] - 'mean': average loss - 'sum': sum losses

Returns

Tensor– Loss tensor (same shape as input/target if reduction='none', scalar otherwise)

Examples

// Heteroscedastic regression: predict both mean and variance
const batch_size = 32;

// Split network output into mean and log-variance
const output = model(input);  // [batch, 100]
const mu = output.slice([0], [50]);        // First 50: predicted mean
const logvar = output.slice([50], [100]);  // Last 50: predicted log-variance
const var = logvar.exp();                  // Exponentiate to ensure > 0

const target = torch.randn([batch_size, 50]);  // Ground truth

const loss = torch.nn.functional.gaussian_nll_loss(
  mu, target, var,
  { full: false }  // Simplified loss
);
// Network learns to predict both accurate mean and appropriate variance

// Uncertainty quantification: prediction with confidence intervals
const x = torch.randn([1, 10]);  // Single input

// Network with two output heads
const mean_head = torch.nn.Linear(64, 5);
const logvar_head = torch.nn.Linear(64, 5);  // Log-variance for numerical stability

const hidden = model_backbone(x);  // [1, 64]
const mu = mean_head(hidden);              // [1, 5]
const var = logvar_head(hidden).exp();     // [1, 5], exponentiate

const targets = torch.tensor([[1, 2, 3, 4, 5]]);

const nll = torch.nn.functional.gaussian_nll_loss(mu, targets, var);

// Prediction confidence: lower variance → more confident
// Can construct 95% CI: mu ± 1.96*sqrt(var)
const std = var.sqrt();
const ci_lower = mu.sub(std.mul(1.96));
const ci_upper = mu.add(std.mul(1.96));

// Aleatoric uncertainty in computer vision: image regression
const batch_images = torch.randn([8, 3, 64, 64]);

// Network predicts image depth map with per-pixel uncertainty
const predicted_depth = model(batch_images);     // [8, 1, 64, 64]
const predicted_logvar = logvar_model(batch_images);  // [8, 1, 64, 64]
const predicted_var = predicted_logvar.exp();    // Ensure positivity

const target_depth = torch.randn([8, 1, 64, 64]);  // Ground truth depth

const depth_loss = torch.nn.functional.gaussian_nll_loss(
  predicted_depth,
  target_depth,
  predicted_var
);
// Per-pixel uncertainty: high variance at occlusions/depth discontinuities
// Smooth variance as auxiliary task (prevents overfitting)

// Bayesian deep learning: ensemble-like uncertainty
const num_samples = 10;
const predictions: Tensor[] = [];
const uncertainties: Tensor[] = [];

for (let i = 0; i < num_samples; i++) {
  // Forward pass with dropout enabled (MC-dropout)
  const mu_i = model.forward_mean(x);      // With dropout
  const var_i = model.forward_var(x);
  predictions.push(mu_i);
  uncertainties.push(var_i);
}

// Aleatoric uncertainty: average variance (data noise)
const aleatoric = torch.stack(uncertainties).mean(0);

// Epistemic uncertainty: variance of predictions (model disagreement)
const epistemic = torch.stack(predictions).var(0);

// Total uncertainty = aleatoric + epistemic
const total_unc = aleatoric.add(epistemic);

// Use mean prediction with aleatoric uncertainty in NLL
const mean_pred = torch.stack(predictions).mean(0);
const nll = torch.nn.functional.gaussian_nll_loss(mean_pred, target, aleatoric);

// Robust regression: outliers automatically downweighted
const x = torch.randn([100, 5]);
const targets = torch.randn([100, 1]);

// Network learns to increase variance for outliers
const predictions = model(x);  // [100, 2]: [mean, logvar]
const mu = predictions.slice([null, 0]);
const var = predictions.slice([null, 1]).exp();  // Positive variance

const loss = torch.nn.functional.gaussian_nll_loss(mu, targets, var);

// Samples with large errors naturally get higher variance predictions
// Reduces influence of outliers (proportional to 1/variance weighting)
// Comparison: MSE treats all errors equally; NLL downweights outliers

// Exact vs simplified NLL (full parameter)
const mu = torch.randn([32, 10]);
const target = torch.randn([32, 10]);
const var = torch.ones([32, 10]);  // Constant variance = 1

// Simplified NLL (faster, usually sufficient)
const simplified = torch.nn.functional.gaussian_nll_loss(mu, target, var, {
  full: false
});
// = 0.5 * [log(1) + (mu-target)²/1] = 0.5 * (mu-target)²

// Exact NLL (includes constant term)
const exact = torch.nn.functional.gaussian_nll_loss(mu, target, var, {
  full: true
});
// = 0.5 * [0 + (mu-target)² + log(2π)] = 0.5*((mu-target)² + log(2π))
// Difference = 0.5 * log(2π) ≈ 0.919 (constant for all samples)

torch.nn.functional.gaussian_nll_loss

Parameters

Returns

Examples

See Also

torch.nn.functional.gaussian_nll_loss

Parameters

Returns

Examples

See Also