torch.Tensor.Tensor.softsign
Softsign activation function.
Maps inputs to (-1, 1) range using the softsign function. Similar to tanh but with different mathematical properties - uses division instead of exponential. Produces smooth, continuous gradients with good numerical stability.
Definition: Softsign(x) = x / (1 + |x|). Continuously differentiable everywhere, with non-zero gradients across the entire input domain.
Advantages over tanh:
- No exponential computation (more stable numerically)
- Smoother saturation (linear decay in tails)
- Gradient doesn't vanish as quickly
Use Cases:
- Stable alternative to tanh in RNNs
- Hidden layers when tanh saturation is problematic
- Numerical stability critical applications
- Time series with large value ranges
- Output bounds: Strictly (-1, 1), never reaches endpoints.
- Non-zero gradients: Unlike tanh, gradient is non-zero everywhere.
- Smooth transition: Cubic decay as |x| → ∞ (slower than tanh).
- Numerically stable: No exponential computation needed.
- Symmetric: f(-x) = -f(x), odd function.
- Output never exactly reaches ±1 (approaches asymptotically).
- Less common than tanh - less optimized in some libraries.
- Slower decay than tanh (may be advantage or disadvantage).
Returns
Tensor<S, D, Dev>– Tensor with same shape, values in (-1, 1)Examples
// Basic usage - smooth normalization to (-1, 1)
const x = torch.tensor([-2, -1, 0, 1, 2]);
x.softsign(); // [-0.667, -0.5, 0, 0.5, 0.667]
// RNN hidden layer - numerically stable
const hidden = torch.randn(32, 128);
const activated = hidden.softsign(); // (-1, 1) range, no exp
// Compares with tanh
const tanh_out = hidden.tanh(); // Using exp (x^2 saturation in tails)
const softsign_out = hidden.softsign(); // Linear decay (better gradients)
// Time series with outliers - softsign robust
const time_series = torch.randn(100, 64);
const outliers = torch.randn(10, 64).mul(5); // Large values
const robust = softsign_out.softsign(); // Handles large values gracefully
// Attention mechanism with bounded output
const scores = torch.randn(8, 16, 16);
const attention = scores.softsign(); // Bounded attention weightsSee Also
- PyTorch torch.nn.functional.softsign()
- tanh - Exponential-based version with faster saturation
- softsign - This function (alternative spelling)
- sigmoid - Asymmetric version (0, 1)
- relu - Hard nonlinearity