torch.nn.Softsign

class Softsign extends Module

Softsign activation function.

Softsign is a smooth activation function that maps inputs to (-1, 1) via the formula Softsign(x) = x / (1 + |x|). It's similar to Tanh in output range and smoothness, but with a simpler formula (no exponentials needed). Softsign is rarely used in modern deep learning (Tanh, ReLU, and smooth activations like GELU/SiLU are standard), but appears occasionally in specialized contexts.

Core idea: Softsign(x) = x / (1 + |x|) provides a smooth, bounded output in (-1, 1). Unlike Tanh which requires exponential computation, Softsign uses only division. The output is smoother than Sigmoid (which has saturation) but different from Tanh's particular curve.

When to use Softsign:

Alternative to Tanh when you want simpler computation (no exponentials)
Output range (-1, 1) desired but don't need Tanh specifically
Experimental/research comparing different smooth activations
Rarely: Tanh or modern smooth activations (GELU, SiLU) usually better

Trade-offs vs Tanh:

Computation: No exponentials (just division) vs Tanh's exp-based computation
Gradient: Softsign: ∂/∂x = 1/(1+|x|)² vs Tanh: ∂/∂x = 1 - tanh²(x)
Saturation: Both saturate for large |x|, but at different rates
Output shape: Different curve but same range (-1, 1)
Empirical quality: Very similar; minor differences in practice
Popularity: Tanh much more common (more established)

Trade-offs vs ReLU:

Smoothness: Softsign smooth everywhere vs ReLU's kink at x=0
Boundedness: Softsign outputs in (-1, 1) vs ReLU's [0, ∞)
Gradient decay: Softsign gradients decay for large |x| (like sigmoid saturation)
Empirical: ReLU usually better in deep networks (no saturation)
Use case: ReLU for standard networks; Softsign rarely needed

Algorithm: Forward: Softsign(x) = x / (1 + |x|) Backward: ∂Softsign/∂x = 1 / (1 + |x|)² (always positive, symmetric around x=0) The gradient is always in (0, 1], decreasing as |x| increases (saturation effect)

\begin{aligned} Softsign(x) = x / (1 + |x|), output range (-1, 1) \\ Gradient: ∂Softsign/∂x = 1 / (1 + |x|)² \\ For x → ∞: Softsign(x) → 1; for x → -∞: Softsign(x) → -1 \end{aligned}

Smooth everywhere: Continuously differentiable with smooth gradient.
Bounded (-1, 1): Like Tanh; simpler to compute (division vs exponentials).
Rarely used: Tanh is more established for (-1, 1) bounded activation.
Zero-centered: Output zero-mean like Tanh; better than Sigmoid for training.
Gradient decay: Gradients decay for large |x| (saturation effect like Sigmoid/Tanh).
Legacy activation: Sometimes seen in older code; modern activations preferred.

Examples

// Network using Softsign (rare, mostly for comparison)
class MLPWithSoftsign extends torch.nn.Module {
  private fc1: torch.nn.Linear;
  private softsign: torch.nn.Softsign;
  private fc2: torch.nn.Linear;

  constructor() {
    super();
    this.fc1 = new torch.nn.Linear(10, 64);
    this.softsign = new torch.nn.Softsign();  // Smooth, bounded activation
    this.fc2 = new torch.nn.Linear(64, 1);
  }

  forward(x: torch.Tensor): torch.Tensor {
    x = this.fc1.forward(x);
    x = this.softsign.forward(x);  // Smooth output in (-1, 1)
    return this.fc2.forward(x);
  }
}
// In practice, ReLU or Tanh would be more common choices

// Comparing smooth activations
const x = torch.linspace(-5, 5, [1000]);
const softsign = new torch.nn.Softsign();
const tanh = new torch.nn.Tanh();
const sigmoid = new torch.nn.Sigmoid();

const y_softsign = softsign.forward(x);  // x / (1 + |x|), range (-1, 1)
const y_tanh = tanh.forward(x);          // exp-based, range (-1, 1)
const y_sigmoid = sigmoid.forward(x);    // exp-based, range (0, 1)

// All smooth but different curves; Softsign is mathematically simpler than Tanh

torch.nn.Softsign

Examples

See Also

torch.nn.Softsign

Examples

See Also