torch.Tensor.Tensor.hardsigmoid
Hard sigmoid activation function.
A piecewise linear approximation of sigmoid with better computational efficiency. Replaces the smooth S-curve with hard clipping, making it cheaper to compute. Commonly used in mobile and edge devices where sigmoid's expensive exponential is avoided.
Definition: HardSigmoid(x) = clip(x/6 + 0.5, 0, 1), computed efficiently without exp.
- For x < -3: output is 0
- For -3 ≤ x ≤ 3: output is linear in x
- For x > 3: output is 1
Use Cases:
- Mobile/edge inference where sigmoid is too expensive
- MobileNet and other efficient architectures
- Gating mechanisms with hard saturation (LSTM/GRU gates)
- Attention masks requiring hard decisions (0 or 1)
- Neural architecture search and lightweight models
- Efficiency: ~5-10x faster than sigmoid due to piecewise linear computation.
- Output range: Strictly [0, 1], never negative or 1 (hard clipping).
- Gradient: Piecewise constant gradient (0 outside [-3, 3], 1/6 inside).
- Mobile optimized: No exponential computation, suitable for low-power devices.
- Architecture: Common in MobileNet, EfficientNet, and other mobile-first models.
- Approximation: Linear inside [-3, 3], exactly matches sigmoid behavior there.
- Gradient is zero outside [-3, 3], may cause dead neurons if input distribution shifts.
- Not differentiable at x = -3 and x = 3 (piecewise corners).
- If you need smooth gradients, consider sigmoid or leaky_relu instead.
Returns
Tensor<S, D, Dev>– Tensor with same shape as input, values in [0, 1]Examples
// Basic usage - piecewise linear approximation
const x = torch.tensor([-3, -1.5, 0, 1.5, 3]);
x.hardsigmoid(); // [0, 0.25, 0.5, 0.75, 1]
// Efficient gating in mobile models
const input = torch.randn(32, 128); // Mobile model input
const gate = input.hardsigmoid(); // Fast gate without exp
// Compare with sigmoid - hardsigmoid is 10-100x faster
const sigmoid_out = input.sigmoid(); // Expensive exponential
const hardsig_out = input.hardsigmoid(); // Simple clipping
// Attention with hard decisions
const scores = torch.randn(16, 16, 64);
const attention_weights = scores.hardsigmoid(); // Hard-masked attention
// Edge device inference - maximize speed
const model_input = torch.randn(1, 224, 224, 3);
const output = model.forward(model_input); // Uses hardsigmoid internallySee Also
- PyTorch torch.nn.functional.hardsigmoid()
- sigmoid - Smooth approximation using exponential
- hardswish - Hard version of SiLU activation
- hard_tanh - Hard version of tanh activation
- relu - Simpler hard nonlinearity