torch.nn.LogSigmoid
class LogSigmoid extends ModuleLogSigmoid activation function.
LogSigmoid computes log(sigmoid(x)) = log(1 / (1 + exp(-x))) efficiently and with numerical stability. It outputs log-probabilities (in range (-∞, 0]) representing the log-likelihood of the positive class. LogSigmoid is rarely used as an explicit activation (most code uses Sigmoid + BCELoss), but appears in loss functions like BCEWithLogitsLoss for numerical stability. It's useful when you explicitly need log-probability space computations.
Core idea: LogSigmoid(x) = log(sigmoid(x)) = log(1 / (1 + exp(-x))) = -log(1 + exp(-x)) = x - log(1 + exp(x)) Computing this directly via log(sigmoid(x)) would be numerically unstable (log of small numbers near zero). The reformulation -log(1 + exp(-x)) avoids this by using the log-sum-exp trick.
**When to use LogSigmoid:
- Explicitly need log-probabilities (rare; usually use loss function instead)
- Custom loss functions requiring log(sigmoid(x)) computation
- Theoretical work or research requiring log-probability outputs
- Never as standard activation (Sigmoid more common, loss handles the log)
Trade-offs vs Sigmoid:
- Probability space: Sigmoid outputs (0, 1) vs LogSigmoid outputs (-∞, 0] (log-space)
- Numerical stability: LogSigmoid more stable for log(sigmoid(x)) computation
- Use case: Sigmoid for classification outputs; LogSigmoid for loss computation only
- Interpretation: Sigmoid is intuitive (probability); LogSigmoid requires understanding log-space
- Practical: Most code uses Sigmoid + BCELoss (cleaner); BCEWithLogitsLoss uses LogSigmoid internally
Algorithm: Forward: LogSigmoid(x) = -log(1 + exp(-x)) using log-sum-exp tricks Numerically: log(1 + exp(-x)) = log(1 + exp(-x)) if x < 0, else x + log(1 + exp(-x)) Backward: ∂LogSigmoid/∂x = sigmoid(x) - 1 (or equivalently -sigmoid(-x)) The gradient is negative sigmoid(x), which ranges from -1 to 0.
- Rarely used directly: Usually handled by loss functions (BCEWithLogitsLoss).
- Log-probability output: Outputs in (-∞, 0], representing log likelihood.
- Numerical stability: More stable than log(sigmoid(x)) computed naively.
- Standard pattern: Use Sigmoid output + BCELoss (cleaner) or BCEWithLogitsLoss (more efficient).
- Gradient is negative: ∂/∂x is always negative (except at limits), unlike positive gradients from Sigmoid.
- Theoretical tool: Useful in probabilistic models and research, not in standard pipelines.
Examples
// Getting log-probabilities explicitly (unusual)
const logits = torch.randn([32, 1]);
const log_sigmoid = new torch.nn.LogSigmoid();
const log_probs = log_sigmoid.forward(logits); // In log-space
// To get actual probabilities from log-space:
const probs = log_probs.exp(); // exp(log(p)) = p// Binary classification with explicit log-likelihood
class BinaryClassifierWithLogProbs extends torch.nn.Module {
private fc1: torch.nn.Linear;
private log_sigmoid: torch.nn.LogSigmoid;
private fc2: torch.nn.Linear;
constructor() {
super();
this.fc1 = new torch.nn.Linear(10, 64);
this.log_sigmoid = new torch.nn.LogSigmoid();
this.fc2 = new torch.nn.Linear(64, 1);
}
forward(x: torch.Tensor): torch.Tensor {
x = this.fc1.forward(x);
x = torch.nn.functional.relu(x);
x = this.fc2.forward(x);
return this.log_sigmoid.forward(x); // Log-probability output
}
}
// Unusual pattern; normally use Sigmoid + BCELoss instead