torch.nn.Hardswish

class Hardswish extends Module

new Hardswish(options?: ActivationOptions)

Hardswish activation function (Hardware-friendly Swish approximation).

Hardswish is a piecewise linear approximation of SiLU (Swish) designed for efficient computation on mobile/edge devices. Instead of computing x * sigmoid(x) which requires expensive operations, Hardswish uses ReLU6 to approximate the sigmoid gate, resulting in simple piecewise linear computation. It was introduced in MobileNetV3 as the standard activation for efficient mobile neural networks.

Core idea: Hardswish(x) = x * ReLU6(x + 3) / 6 = x * clamp((x + 3) / 6, 0, 1). This is a piecewise linear approximation of SiLU(x) = x * sigmoid(x). The magic numbers 3 and 6 come from sigmoid approximation: sigmoid(x) ≈ (x + 3) / 6 in the range [-3, 3] (where sigmoid is most interesting).

When to use Hardswish:

Mobile networks: MobileNetV3 uses Hardswish as default activation
Edge/embedded devices: Integer-arithmetic friendly (no exp/sigmoid needed)
Quantization: Like ReLU6, designed for low-precision quantized deployment
Efficiency critical: When SiLU's sigmoid is too expensive
Drop-in replacement: Can replace SiLU in mobile models with minimal quality loss

Trade-offs vs SiLU:

Compute efficiency: Piecewise linear (simple comparisons/clamps) vs sigmoid's expensive exp()
Approximation quality: Approximates SiLU well in [-3, 3] range; same behavior outside this range
Integer-friendly: Can be computed entirely with integer arithmetic (unlike SiLU's exp())
Empirical quality: In full float precision, SiLU slightly better; with quantization, Hardswish comparable
Quantization-friendly: Bounded output (similar to ReLU6), designed for int8 deployment
Architectural need: Only really beneficial on hardware with integer-only arithmetic

Algorithm: Forward: Hardswish(x) = x * ReLU6(x + 3) / 6 = x * clamp((x + 3) / 6, 0, 1) Piecewise linear: 0 if x < -3, x*(x+3)/6 if -3 ≤ x ≤ 3, x if x > 3 n * Backward: Piecewise: 0 if x < -3, (2x+3)/6 if -3 ≤ x ≤ 3, 1 if x > 3 The sigmoid approximation is exact in the middle range [-3, 3] where most computation happens

\begin{aligned} Hardswish(x) = x * ReLU6(x + 3) / 6 = x * clamp((x + 3) / 6, 0, 1) \\ Piecewise: {0 if x < -3, x(x+3)/6 if -3 ≤ x ≤ 3, x if x > 3} \\ This approximates SiLU(x) = x * sigmoid(x) = x / (1 + e^{-x}) \end{aligned}

MobileNetV3 standard: Default activation in MobileNetV3 for mobile efficiency.
Piecewise linear: Approximates SiLU with simple piecewise linear function.
Integer-friendly: Can be implemented entirely with integer arithmetic (no exp).
Approximation range: Exact approximation to SiLU in [-3, 3]; decent outside.
Quantization designed: Like ReLU6, outputs bounded for quantized deployment.
Server less common: On server hardware with fast float ops, SiLU usually preferred.

Examples

// MobileNetV3 block with Hardswish
class MobileNetV3Block extends torch.nn.Module {
  private conv1: torch.nn.Conv2d;
  private hardswish: torch.nn.Hardswish;
  private conv2: torch.nn.Conv2d;

  constructor() {
    super();
    this.conv1 = new torch.nn.Conv2d(32, 64, { kernel_size: 3, padding: 1 });
    this.hardswish = new torch.nn.Hardswish();  // MobileNetV3 standard
    this.conv2 = new torch.nn.Conv2d(64, 32, { kernel_size: 3, padding: 1 });
  }

  forward(x: torch.Tensor): torch.Tensor {
    let out = this.conv1.forward(x);
    out = this.hardswish.forward(out);  // Hardware-friendly SiLU approximation
    out = this.conv2.forward(out);
    return x.add(out);  // Residual connection
  }
}

// Comparing Hardswish vs SiLU
const x = torch.linspace(-5, 5, [1000]);
const silu = new torch.nn.SiLU();
const hardswish = new torch.nn.Hardswish();

const y_silu = silu.forward(x);          // Smooth SiLU
const y_hardswish = hardswish.forward(x); // Piecewise linear approximation

// Hardswish ≈ SiLU in [-3, 3] range, but much faster to compute
// For mobile/edge, Hardswish is preferred; for server, SiLU is typical

// Hardswish for quantization-aware training
class QuantizedMobileModel extends torch.nn.Module {
  private conv1: torch.nn.Conv2d;
  private hardswish: torch.nn.Hardswish;
  private conv2: torch.nn.Conv2d;

  constructor() {
    super();
    this.conv1 = new torch.nn.Conv2d(3, 32, { kernel_size: 3 });
    this.hardswish = new torch.nn.Hardswish();  // Integer-friendly
    this.conv2 = new torch.nn.Conv2d(32, 64, { kernel_size: 3 });
  }

  forward(x: torch.Tensor): torch.Tensor {
    x = this.conv1.forward(x);
    x = this.hardswish.forward(x);  // Can be computed with integer arithmetic
    return this.conv2.forward(x);
  }
}
// This model quantizes cleanly to int8 without complex activations

torch.nn.Hardswish

Examples

See Also

torch.nn.Hardswish

Examples

See Also