torch.hardswish

function hardswish<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>): Tensor<S, D, Dev>

Applies the Hardswish function element-wise.

Hardswish is the efficient counterpart to SiLU/Swish, combining hard sigmoid gating with multiplication. Designed for mobile and edge devices, it provides the benefits of smooth self-gating without expensive exponential computations. The default activation in MobileNetV3.

Hardswish(x) = x * Hardsigmoid(x) = x * { 0 if x ≤ -3, 1 if x ≥ 3, (x + 3) / 6 otherwise }

Efficiency: No exponential, purely polynomial computation - ideal for mobile
Self-gating: Like SiLU, provides soft attention/gating by multiplying with bounded signal
Approximation: Excellent approximation to SiLU in the -3 to 3 range
Training dynamics: Simpler gradients than SiLU but maintains gating benefits
Hardware friendly: Fast on CPUs, GPUs, TPUs, and embedded devices
Piecewise smooth: Continuous but not smooth at boundaries (-3, 3) - doesn't affect training

At boundaries (x = -3, 3), gradient changes discontinuously; doesn't cause problems in practice

Parameters

inputTensor<S, D, Dev>: The input tensor

Returns

Tensor<S, D, Dev>– A new tensor with Hardswish applied element-wise

Examples

// Basic usage
const x = torch.tensor([-5, -3, -1, 0, 1, 3, 5]);
torch.hardswish(x);  // [0, 0, -0.167, 0, 0.167, 3, 5]

// Comparison with SiLU (Swish) - similar purpose, different efficiency
const silu_out = torch.silu(x);         // Smooth gating with exp()
const hardswish_out = torch.hardswish(x); // Linear approx, no exp
// Hardswish is much faster on mobile while maintaining similar expressiveness

// In MobileNetV3 block (the standard use case)
const x = torch.randn(batch_size, channels, height, width);
const conv1 = new torch.nn.Conv2d(in_ch, mid_ch, 3, 1, 1);
const se = new torch.nn.Sequential(
  new torch.nn.AdaptiveAvgPool2d([1, 1]),
  new torch.nn.Linear(mid_ch, mid_ch // 4),
  // torch.hardswish activation on SE module
);
const hidden = torch.hardswish(conv1(x));  // Efficient activation
const out = conv2(hidden);

// Self-gating property (like SiLU but efficient)
const features = torch.randn(batch, 512);
const gated = torch.hardswish(features);  // Soft multiplicative gate
// Values near 0 are suppressed, values > 3 pass through unchanged

torch.hardswish

Parameters

Returns

Examples

See Also

torch.hardswish

Parameters

Returns

Examples

See Also