torch.hardswish
function hardswish<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>): Tensor<S, D, Dev>Applies the Hardswish function element-wise.
Hardswish is the efficient counterpart to SiLU/Swish, combining hard sigmoid gating with multiplication. Designed for mobile and edge devices, it provides the benefits of smooth self-gating without expensive exponential computations. The default activation in MobileNetV3.
- Efficiency: No exponential, purely polynomial computation - ideal for mobile
- Self-gating: Like SiLU, provides soft attention/gating by multiplying with bounded signal
- Approximation: Excellent approximation to SiLU in the -3 to 3 range
- Training dynamics: Simpler gradients than SiLU but maintains gating benefits
- Hardware friendly: Fast on CPUs, GPUs, TPUs, and embedded devices
- Piecewise smooth: Continuous but not smooth at boundaries (-3, 3) - doesn't affect training
At boundaries (x = -3, 3), gradient changes discontinuously; doesn't cause problems in practice
Parameters
inputTensor<S, D, Dev>- The input tensor
Returns
Tensor<S, D, Dev>– A new tensor with Hardswish applied element-wiseExamples
// Basic usage
const x = torch.tensor([-5, -3, -1, 0, 1, 3, 5]);
torch.hardswish(x); // [0, 0, -0.167, 0, 0.167, 3, 5]
// Comparison with SiLU (Swish) - similar purpose, different efficiency
const silu_out = torch.silu(x); // Smooth gating with exp()
const hardswish_out = torch.hardswish(x); // Linear approx, no exp
// Hardswish is much faster on mobile while maintaining similar expressiveness
// In MobileNetV3 block (the standard use case)
const x = torch.randn(batch_size, channels, height, width);
const conv1 = new torch.nn.Conv2d(in_ch, mid_ch, 3, 1, 1);
const se = new torch.nn.Sequential(
new torch.nn.AdaptiveAvgPool2d([1, 1]),
new torch.nn.Linear(mid_ch, mid_ch // 4),
// torch.hardswish activation on SE module
);
const hidden = torch.hardswish(conv1(x)); // Efficient activation
const out = conv2(hidden);
// Self-gating property (like SiLU but efficient)
const features = torch.randn(batch, 512);
const gated = torch.hardswish(features); // Soft multiplicative gate
// Values near 0 are suppressed, values > 3 pass through unchangedSee Also
- PyTorch torch.nn.functional.hardswish()
- silu - Smooth gating alternative, better quality but slower
- hardsigmoid - The gating function used internally
- relu - Simpler unbounded alternative