torch.nn.SoftplusOptions

Softplus activation function.

Softplus is a smooth, differentiable approximation of ReLU. It applies the function Softplus(x) = (1/β) * log(1 + exp(βx)), which smoothly transitions from near-zero for negative inputs to approximately linear for large positive inputs. Softplus is rarely used in modern deep learning (ReLU and variants are standard), but appears in Bayesian neural networks and some probabilistic models where smoothness is important.

Core idea: Softplus is the smooth, differentiable approximation of ReLU. For large |βx|, it approaches ReLU's behavior: softplus(x) ≈ max(0, x). For small x, it's smooth with continuous gradient. The beta parameter controls sharpness: large β makes it sharper (closer to ReLU), small β makes it smoother.

When to use Softplus:

Bayesian networks: Where smoothness of posterior approximation matters
Probabilistic models: Where differentiability everywhere is required
Rarely: Modern deep learning typically uses ReLU or smooth activations (GELU, SiLU)
Smooth approximation: When you need ReLU's behavior but require smoothness everywhere
Output bounds: Softplus(x) ≥ 0, making it useful for modeling non-negative quantities

Trade-offs vs ReLU:

Smoothness: Continuously differentiable everywhere (unlike ReLU's kink at x=0)
Computational cost: Requires log(1 + exp()) which is more expensive than ReLU's max
Gradient flow: Smooth gradients everywhere (good) but never exactly zero (always some gradient)
Empirical quality: ReLU usually slightly better in standard deep learning
Theoretical appeal: Smooth approximation is mathematically elegant but rarely better in practice
Zero not attainable: Softplus(x) > 0 always (unlike ReLU which can output exactly zero)

Algorithm: Forward: Softplus(x) = (1/β) * log(1 + exp(βx)) For numerical stability: uses log(1 + exp(βx)) = βx + log(1 + exp(-βx)) when βx > 0 Backward: ∂Softplus/∂x = σ(βx) = 1 / (1 + exp(-βx)) (sigmoid function!) The gradient is sigmoid, which smoothly transitions from 0 to 1 as x increases.

Definition

export interface SoftplusOptions {
  /** Controls sharpness; larger β → sharper (like ReLU) (default: 1) */
  beta?: number;
  /** Threshold above which linear approximation is used for numerical stability (default: 20) */
  threshold?: number;
}

beta(number)optional: – Controls sharpness; larger β → sharper (like ReLU) (default: 1)
threshold(number)optional: – Threshold above which linear approximation is used for numerical stability (default: 20)

Examples

// Probabilistic model using Softplus for smoothness
class BayesianLinear extends torch.nn.Module {
  private fc: torch.nn.Linear;
  private softplus: torch.nn.Softplus;

  constructor() {
    super();
    this.fc = new torch.nn.Linear(10, 5);
    this.softplus = new torch.nn.Softplus();  // Smooth activation
  }

  forward(x: torch.Tensor): torch.Tensor {
    x = this.fc.forward(x);
    return this.softplus.forward(x);  // Smooth, always positive
  }
}

// Comparing ReLU vs Softplus smoothness
const x = torch.linspace(-5, 5, [1000]);
const relu = new torch.nn.ReLU();
const softplus = new torch.nn.Softplus();  // beta = 1
const softplus_sharp = new torch.nn.Softplus(2.0);  // Sharper approximation

const y_relu = relu.forward(x);              // Kink at x=0
const y_softplus = softplus.forward(x);      // Smooth everywhere
const y_sharp = softplus_sharp.forward(x);   // Closer to ReLU