torch.nn.functional.nll_loss

function nll_loss(input: Tensor, target: Tensor): Tensor

function nll_loss(input: Tensor, target: Tensor, weight: Tensor | null, size_average: boolean | null, ignore_index: number, reduce: boolean | null, reduction: 'none' | 'mean' | 'sum', options: NllLossFunctionalOptions): Tensor

Negative Log Likelihood (NLL) loss: standard loss for classification with pre-computed log-probabilities.

Computes the negative log-likelihood between log-probabilities and target class indices. Essential for classification tasks where you have already computed log-softmax (or other log-probabilities). Unlike cross_entropy, this function expects pre-computed log probabilities (output of log_softmax), making it the lower-level operation. Use nll_loss when:

You need explicit control over the log-probability computation (log_softmax, logsigmoid, etc.)
You're combining NLL with other probability models
You have pre-normalized log-probabilities from a different source
You want to use the output of log_softmax directly without re-softmax-ing

Relationship to cross_entropy: cross_entropy(logits, target) = nll_loss(log_softmax(logits), target). For classification from raw logits, use cross_entropy (simpler). For log-probabilities, use nll_loss.

Loss Computation: For each sample in batch, the loss is -log(p_target), where p_target is the model's predicted log-probability of the correct class. Negative because we want to minimize the negative log-likelihood (maximize log-likelihood). Loss approaches 0 as model confidence in correct class approaches 1 (log-prob approaches 0).

\begin{aligned} \text{Loss}_i = -\log(p_{\text{target}_i}) \quad \text{where } p_{\text{target}_i} = \exp(\text{input}[i, \text{target}_i]) \\ \text{Final loss} = \begin{cases} \\ \text{Gradient: } \frac{\partial L}{\partial \text{input}[i,j]} = \begin{cases} \end{aligned}

Pre-computed log-probabilities: Input must be log-probabilities (e.g., from log_softmax), not raw probabilities or logits. Passing logits will give incorrect results.
Target indices: Target values should be in range [0, num_classes-1]. Out-of-range indices cause undefined behavior or errors.
Gradient flow: Gradients flow to the correct class positions in input. Other positions get zero gradient, enabling efficient backpropagation even with large output vocabularies.
Numerical stability: log_softmax is numerically stable (uses max trick), so NLL computed from its output avoids underflow/overflow that would occur with raw probabilities.
Batching efficiency: Vectorized computation across entire batch. GPU implementation uses compute shaders for all batch items in parallel.
Reduction='mean' averages over batch: Ensures loss is independent of batch size, critical for learning rate scheduling and hyperparameter tuning.
Reduction='none' for analysis: Use 'none' to inspect per-sample losses, identify mislabeled data, or implement curriculum learning and hard negative mining.

Shape requirements: Input must be 2D [batch_size, num_classes], target must be 1D [batch_size]. Other shapes are not supported and will throw errors.
Batch size mismatch: Input and target must have same batch dimension. Mismatch throws error.
Invalid target indices: Target indices outside [0, num_classes-1] lead to undefined behavior. Always validate targets are valid class indices.
Not compatible with raw logits: Directly passing logits from model output will give wrong loss. Must apply log_softmax first.
Numerical issues with extreme confidences: If log-probs are extremely negative (model very confident wrong), loss can be very large. May indicate training issues like learning rate too high.

Parameters

inputTensor: Pre-computed log-probabilities from log_softmax. Shape [batch_size, num_classes]. Expected range: typically (-∞, 0], with 0 for log(1) and very negative for log(~0).
targetTensor: Target class indices (0 to num_classes-1). Shape [batch_size]. Integer-like values.

Returns

Tensor– Loss tensor. Shape [] if reduction='mean' or 'sum', shape [batch_size] if reduction='none'

Examples

// Basic NLL loss with log_softmax output
const logits = torch.randn(4, 10);              // [batch=4, num_classes=10]
const targets = torch.tensor([2, 5, 1, 9]);    // True class indices
const log_probs = torch.nn.functional.log_softmax(logits, { dim: -1 });
const loss = torch.nn.functional.nll_loss(log_probs, targets);  // scalar

// Per-sample losses for debugging
const per_sample = torch.nn.functional.nll_loss(log_probs, targets, 'none');  // [4]
// Inspect which samples have highest loss (hardest to classify)

// Language model training: next token prediction
const seq_length = 128;
const vocab_size = 50000;
const logits = model(tokens);                         // [seq_length, vocab_size]
const next_token_targets = tokens_shifted;           // True next tokens [seq_length]
const log_probs = torch.nn.functional.log_softmax(logits, { dim: -1 });
const token_losses = torch.nn.functional.nll_loss(log_probs, next_token_targets, 'none');
const avg_loss = token_losses.mean();                // Scalar loss for backprop

// Multi-class classification with class weighting (manual implementation)
const log_probs = torch.nn.functional.log_softmax(logits, { dim: -1 });
const unweighted_loss = torch.nn.functional.nll_loss(log_probs, targets, 'none');  // [batch]
const class_weights = torch.tensor([0.1, 0.2, 0.15, ...]);  // Custom weights per class
const sample_weights = class_weights.gather(0, targets);    // Get weight for each sample
const weighted_loss = (unweighted_loss * sample_weights).mean();

// Importance sampling / curriculum learning
// Compute loss for all samples, then select hard negatives
const log_probs = torch.nn.functional.log_softmax(logits, { dim: -1 });
const losses = torch.nn.functional.nll_loss(log_probs, targets, 'none');  // [batch]
const k_hardest = 32;
const hard_loss_indices = losses.topk(k_hardest)[1];  // Hardest samples
const hard_samples_loss = losses.gather(0, hard_loss_indices).mean();

torch.nn.functional.nll_loss

Parameters

Returns

Examples

See Also

torch.nn.functional.nll_loss

Parameters

Returns

Examples

See Also