torch.nn.functional.nll_loss
function nll_loss(input: Tensor, target: Tensor): Tensorfunction nll_loss(input: Tensor, target: Tensor, weight: Tensor | null, size_average: boolean | null, ignore_index: number, reduce: boolean | null, reduction: 'none' | 'mean' | 'sum', options: NllLossFunctionalOptions): TensorNegative Log Likelihood (NLL) loss: standard loss for classification with pre-computed log-probabilities.
Computes the negative log-likelihood between log-probabilities and target class indices. Essential for classification tasks where you have already computed log-softmax (or other log-probabilities). Unlike cross_entropy, this function expects pre-computed log probabilities (output of log_softmax), making it the lower-level operation. Use nll_loss when:
- You need explicit control over the log-probability computation (log_softmax, logsigmoid, etc.)
- You're combining NLL with other probability models
- You have pre-normalized log-probabilities from a different source
- You want to use the output of log_softmax directly without re-softmax-ing
Relationship to cross_entropy: cross_entropy(logits, target) = nll_loss(log_softmax(logits), target). For classification from raw logits, use cross_entropy (simpler). For log-probabilities, use nll_loss.
Loss Computation: For each sample in batch, the loss is -log(p_target), where p_target is the model's predicted log-probability of the correct class. Negative because we want to minimize the negative log-likelihood (maximize log-likelihood). Loss approaches 0 as model confidence in correct class approaches 1 (log-prob approaches 0).
\begin{aligned} \text{Loss}_i = -\log(p_{\text{target}_i}) \quad \text{where } p_{\text{target}_i} = \exp(\text{input}[i, \text{target}_i]) \\ \text{Final loss} = \begin{cases} \\ \text{Gradient: } \frac{\partial L}{\partial \text{input}[i,j]} = \begin{cases} \end{aligned}- Pre-computed log-probabilities: Input must be log-probabilities (e.g., from log_softmax), not raw probabilities or logits. Passing logits will give incorrect results.
- Target indices: Target values should be in range [0, num_classes-1]. Out-of-range indices cause undefined behavior or errors.
- Gradient flow: Gradients flow to the correct class positions in input. Other positions get zero gradient, enabling efficient backpropagation even with large output vocabularies.
- Numerical stability: log_softmax is numerically stable (uses max trick), so NLL computed from its output avoids underflow/overflow that would occur with raw probabilities.
- Batching efficiency: Vectorized computation across entire batch. GPU implementation uses compute shaders for all batch items in parallel.
- Reduction='mean' averages over batch: Ensures loss is independent of batch size, critical for learning rate scheduling and hyperparameter tuning.
- Reduction='none' for analysis: Use 'none' to inspect per-sample losses, identify mislabeled data, or implement curriculum learning and hard negative mining.
- Shape requirements: Input must be 2D [batch_size, num_classes], target must be 1D [batch_size]. Other shapes are not supported and will throw errors.
- Batch size mismatch: Input and target must have same batch dimension. Mismatch throws error.
- Invalid target indices: Target indices outside [0, num_classes-1] lead to undefined behavior. Always validate targets are valid class indices.
- Not compatible with raw logits: Directly passing logits from model output will give wrong loss. Must apply log_softmax first.
- Numerical issues with extreme confidences: If log-probs are extremely negative (model very confident wrong), loss can be very large. May indicate training issues like learning rate too high.
Parameters
inputTensor- Pre-computed log-probabilities from log_softmax. Shape [batch_size, num_classes]. Expected range: typically (-∞, 0], with 0 for log(1) and very negative for log(~0).
targetTensor- Target class indices (0 to num_classes-1). Shape [batch_size]. Integer-like values.
Returns
Tensor– Loss tensor. Shape [] if reduction='mean' or 'sum', shape [batch_size] if reduction='none'Examples
// Basic NLL loss with log_softmax output
const logits = torch.randn(4, 10); // [batch=4, num_classes=10]
const targets = torch.tensor([2, 5, 1, 9]); // True class indices
const log_probs = torch.nn.functional.log_softmax(logits, { dim: -1 });
const loss = torch.nn.functional.nll_loss(log_probs, targets); // scalar
// Per-sample losses for debugging
const per_sample = torch.nn.functional.nll_loss(log_probs, targets, 'none'); // [4]
// Inspect which samples have highest loss (hardest to classify)
// Language model training: next token prediction
const seq_length = 128;
const vocab_size = 50000;
const logits = model(tokens); // [seq_length, vocab_size]
const next_token_targets = tokens_shifted; // True next tokens [seq_length]
const log_probs = torch.nn.functional.log_softmax(logits, { dim: -1 });
const token_losses = torch.nn.functional.nll_loss(log_probs, next_token_targets, 'none');
const avg_loss = token_losses.mean(); // Scalar loss for backprop
// Multi-class classification with class weighting (manual implementation)
const log_probs = torch.nn.functional.log_softmax(logits, { dim: -1 });
const unweighted_loss = torch.nn.functional.nll_loss(log_probs, targets, 'none'); // [batch]
const class_weights = torch.tensor([0.1, 0.2, 0.15, ...]); // Custom weights per class
const sample_weights = class_weights.gather(0, targets); // Get weight for each sample
const weighted_loss = (unweighted_loss * sample_weights).mean();
// Importance sampling / curriculum learning
// Compute loss for all samples, then select hard negatives
const log_probs = torch.nn.functional.log_softmax(logits, { dim: -1 });
const losses = torch.nn.functional.nll_loss(log_probs, targets, 'none'); // [batch]
const k_hardest = 32;
const hard_loss_indices = losses.topk(k_hardest)[1]; // Hardest samples
const hard_samples_loss = losses.gather(0, hard_loss_indices).mean();See Also
- PyTorch torch.nn.functional.nll_loss
- cross_entropy - Higher-level function that combines log_softmax + nll_loss automatically
- log_softmax - Used to compute input log-probabilities for nll_loss
- softmax - Complementary operation (softmax vs log_softmax)
- torch.nn.NLLLoss - Stateful class-based version with optional weight parameter
- binary_cross_entropy_with_logits - Loss for binary classification from logits