torch.nn.LogSoftmaxOptions
LogSoftmax activation function.
LogSoftmax computes log(softmax(x)) efficiently and numerically stably. It converts logits into log-probability distributions. LogSoftmax is typically used as an intermediate step before loss functions like NLLLoss (Negative Log Likelihood Loss), especially for training efficiency and numerical stability. While standard PyTorch practice is to use raw logits with CrossEntropyLoss (which applies softmax internally), LogSoftmax can be useful for custom loss functions or when log-probabilities are explicitly needed.
Core idea: LogSoftmax(x_i) = log(exp(x_i) / Σ_j exp(x_j)) = x_i - log(Σ_j exp(x_j)) Computing log(softmax(x)) directly via this form avoids the numerical instability of computing softmax first and then taking log (which can cause log(0) = -inf for small probabilities).
When to use LogSoftmax:
- With NLLLoss: Standard pair is LogSoftmax output + NLLLoss (though CrossEntropyLoss + raw logits is simpler)
- Custom loss functions: When you explicitly need log-probabilities for your loss function
- Rarely for output: Most modern code uses CrossEntropyLoss which combines both internally
- Theoretical explicitness: When you want to be explicit about computing probabilities
- Legacy code: Sometimes found in older PyTorch codebases
Trade-offs vs Softmax:
- Numerical stability: log(softmax(x)) computed directly is more stable than softmax then log
- Probability space: Output is in (-∞, 0), representing log-probabilities (vs (0, 1) for softmax)
- Computational efficiency: log(softmax()) avoids computing full probabilities (saves exp/exp)
- Loss function pair: LogSoftmax pairs with NLLLoss; Softmax pairs with CrossEntropyLoss (equivalent results)
- Modern practice: CrossEntropyLoss is simpler and more commonly used (does both internally)
Algorithm: Forward: LogSoftmax(x)_i = x_i - log(Σ_j exp(x_j)) Uses log-sum-exp trick internally: log(Σ_j exp(x_j)) = max(x) + log(Σ_j exp(x_j - max(x))) This avoids overflow from exp(x) while maintaining numerical precision. Backward: ∂LogSoftmax_i/∂x_j = δ_ij - Softmax_j (simpler than softmax's Jacobian)
Definition
export interface LogSoftmaxOptions {
/** Dimension along which log_softmax is computed (default: -1) */
dim?: number;
}dim(number)optional- – Dimension along which log_softmax is computed (default: -1)
Examples
// Classic: LogSoftmax + NLLLoss (equivalent to CrossEntropyLoss + raw logits)
class ClassifierWithLogSoftmax extends torch.nn.Module {
private fc: torch.nn.Linear;
private log_softmax: torch.nn.LogSoftmax;
constructor() {
super();
this.fc = new torch.nn.Linear(10, 5); // 5 classes
this.log_softmax = new torch.nn.LogSoftmax(-1);
}
forward(x: torch.Tensor): torch.Tensor {
x = this.fc.forward(x);
return this.log_softmax.forward(x); // Log-probabilities
}
}
// Then use with NLLLoss:
// const loss_fn = new torch.nn.NLLLoss();
// loss = loss_fn(log_probs, targets);// Equivalence: Two ways to the same place
const logits = torch.randn([32, 10]);
const targets = torch.randint(0, 10, [32]);
// Method 1: Modern - CrossEntropyLoss on raw logits
const ce_loss = new torch.nn.CrossEntropyLoss();
const loss1 = ce_loss.forward(logits, targets);
// Method 2: Classic - LogSoftmax + NLLLoss
const log_softmax = new torch.nn.LogSoftmax(-1);
const log_probs = log_softmax.forward(logits);
const nll_loss = new torch.nn.NLLLoss();
const loss2 = nll_loss.forward(log_probs, targets);
// loss1 ≈ loss2 (numerically equivalent)// Getting actual log-probabilities for analysis
const logits = torch.randn([5, 3]); // [batch_size=5, num_classes=3]
const log_softmax = new torch.nn.LogSoftmax(-1);
const log_probs = log_softmax.forward(logits);
// log_probs are in (-∞, 0], summing to 0 (not 1 like probabilities)
// log(prob) = -inf means prob ≈ 0, log(prob) = 0 means prob = 1
// log(0.1) ≈ -2.3, log(0.9) ≈ -0.1