torch.nn.LogSigmoid

class LogSigmoid extends Module

LogSigmoid activation function.

LogSigmoid computes log(sigmoid(x)) = log(1 / (1 + exp(-x))) efficiently and with numerical stability. It outputs log-probabilities (in range (-∞, 0]) representing the log-likelihood of the positive class. LogSigmoid is rarely used as an explicit activation (most code uses Sigmoid + BCELoss), but appears in loss functions like BCEWithLogitsLoss for numerical stability. It's useful when you explicitly need log-probability space computations.

Core idea: LogSigmoid(x) = log(sigmoid(x)) = log(1 / (1 + exp(-x))) = -log(1 + exp(-x)) = x - log(1 + exp(x)) Computing this directly via log(sigmoid(x)) would be numerically unstable (log of small numbers near zero). The reformulation -log(1 + exp(-x)) avoids this by using the log-sum-exp trick.

**When to use LogSigmoid:

Explicitly need log-probabilities (rare; usually use loss function instead)
Custom loss functions requiring log(sigmoid(x)) computation
Theoretical work or research requiring log-probability outputs
Never as standard activation (Sigmoid more common, loss handles the log)

Trade-offs vs Sigmoid:

Probability space: Sigmoid outputs (0, 1) vs LogSigmoid outputs (-∞, 0] (log-space)
Numerical stability: LogSigmoid more stable for log(sigmoid(x)) computation
Use case: Sigmoid for classification outputs; LogSigmoid for loss computation only
Interpretation: Sigmoid is intuitive (probability); LogSigmoid requires understanding log-space
Practical: Most code uses Sigmoid + BCELoss (cleaner); BCEWithLogitsLoss uses LogSigmoid internally

Algorithm: Forward: LogSigmoid(x) = -log(1 + exp(-x)) using log-sum-exp tricks Numerically: log(1 + exp(-x)) = log(1 + exp(-x)) if x < 0, else x + log(1 + exp(-x)) Backward: ∂LogSigmoid/∂x = sigmoid(x) - 1 (or equivalently -sigmoid(-x)) The gradient is negative sigmoid(x), which ranges from -1 to 0.

\begin{aligned} LogSigmoid(x) = log(σ(x)) = log(1 / (1 + e^{-x})) = -log(1 + e^{-x}) = x - softplus(x) \\ where σ(x) = sigmoid(x) = 1 / (1 + e^{-x}) \\ Gradient: ∂LogSigmoid/∂x = σ(x) - 1 ∈ (-1, 0] \end{aligned}

Rarely used directly: Usually handled by loss functions (BCEWithLogitsLoss).
Log-probability output: Outputs in (-∞, 0], representing log likelihood.
Numerical stability: More stable than log(sigmoid(x)) computed naively.
Standard pattern: Use Sigmoid output + BCELoss (cleaner) or BCEWithLogitsLoss (more efficient).
Gradient is negative: ∂/∂x is always negative (except at limits), unlike positive gradients from Sigmoid.
Theoretical tool: Useful in probabilistic models and research, not in standard pipelines.

Examples

// Getting log-probabilities explicitly (unusual)
const logits = torch.randn([32, 1]);
const log_sigmoid = new torch.nn.LogSigmoid();
const log_probs = log_sigmoid.forward(logits);  // In log-space

// To get actual probabilities from log-space:
const probs = log_probs.exp();  // exp(log(p)) = p

// Binary classification with explicit log-likelihood
class BinaryClassifierWithLogProbs extends torch.nn.Module {
  private fc1: torch.nn.Linear;
  private log_sigmoid: torch.nn.LogSigmoid;
  private fc2: torch.nn.Linear;

  constructor() {
    super();
    this.fc1 = new torch.nn.Linear(10, 64);
    this.log_sigmoid = new torch.nn.LogSigmoid();
    this.fc2 = new torch.nn.Linear(64, 1);
  }

  forward(x: torch.Tensor): torch.Tensor {
    x = this.fc1.forward(x);
    x = torch.nn.functional.relu(x);
    x = this.fc2.forward(x);
    return this.log_sigmoid.forward(x);  // Log-probability output
  }
}
// Unusual pattern; normally use Sigmoid + BCELoss instead

torch.nn.LogSigmoid

Examples

See Also

torch.nn.LogSigmoid

Examples

See Also