torch.nn.Hardsigmoid

class Hardsigmoid extends Module

new Hardsigmoid(options?: ActivationOptions)

Hardsigmoid activation function (Hardware-friendly sigmoid approximation).

Hardsigmoid is a piecewise linear approximation of Sigmoid designed for efficient computation on mobile/edge devices. Instead of computing sigmoid(x) = 1 / (1 + exp(-x)) which requires expensive exp(), Hardsigmoid uses a simple piecewise linear function. The constants 3 and 6 are chosen so that the linear approximation matches sigmoid well in the active range [-3, 3]. It was introduced in MobileNetV2 for efficient gating mechanisms.

Core idea: Hardsigmoid(x) = clamp((x + 3) / 6, 0, 1) = {0 if x ≤ -3, (x+3)/6 if -3 < x < 3, 1 if x ≥ 3} This is a piecewise linear approximation of sigmoid where the slope (1/6) ensures the output range matches. Unlike Sigmoid's smooth exponential curve, Hardsigmoid uses just comparisons and arithmetic.

When to use Hardsigmoid:

Mobile networks: MobileNetV2+ for efficient gating in squeeze-excitation blocks
Edge/embedded devices: Integer-arithmetic friendly (no exp needed)
Quantization: Designed for low-precision quantized deployment
Gate mechanisms: Where Sigmoid gating is too expensive
Drop-in replacement: For Sigmoid in mobile models with minimal quality loss

Trade-offs vs Sigmoid:

Compute efficiency: Piecewise linear (simple comparisons) vs sigmoid's expensive exp()
Approximation quality: Approximates sigmoid well in [-3, 3]; exact match at boundaries
Integer-friendly: Can be computed entirely with integer arithmetic
Empirical quality: In full precision, Sigmoid slightly better; with quantization, Hardsigmoid comparable
Quantization-friendly: Bounded output [0, 1], designed for int8 deployment
Hardware benefit: On mobile with integer-only arithmetic, major speedup; on GPU, negligible

Algorithm: Forward: Hardsigmoid(x) = clamp((x + 3) / 6, 0, 1) Piecewise: 0 if x < -3, (x+3)/6 if -3 ≤ x ≤ 3, 1 if x > 3 Backward: Piecewise: 0 if x < -3, 1/6 if -3 ≤ x ≤ 3, 0 if x > 3 The linear approximation makes gradient computation trivial (just 1/6 in the active range)

\begin{aligned} Hardsigmoid(x) = clamp((x + 3) / 6, 0, 1) = min(max((x + 3) / 6, 0), 1) \\ Piecewise: {0 if x < -3, (x+3)/6 if -3 ≤ x ≤ 3, 1 if x > 3} \\ This approximates sigmoid(x) = 1 / (1 + e^{-x}) \end{aligned}

Mobile standard: Efficient gating in MobileNetV2, MobileNetV3 SE blocks.
Piecewise linear: Approximates sigmoid with simple piecewise linear function.
Integer-friendly: Can be computed with integer arithmetic (no exp).
Approximation range: Linear match to sigmoid in [-3, 3]; exact boundaries at -3, 3.
Quantization designed: Bounded [0, 1] output for quantized int8 deployment.
Server less common: On server hardware with fast float ops, Sigmoid usually preferred.

Examples

// Squeeze-Excitation block with Hardsigmoid (MobileNetV2 style)
class SEBlock extends torch.nn.Module {
  private avg_pool: torch.nn.AdaptiveAvgPool2d;
  private fc1: torch.nn.Conv2d;
  private hardsigmoid: torch.nn.Hardsigmoid;  // Efficient gating
  private fc2: torch.nn.Conv2d;

  constructor(channels: number) {
    super();
    this.avg_pool = new torch.nn.AdaptiveAvgPool2d([1, 1]);
    this.fc1 = new torch.nn.Conv2d(channels, channels / 16, { kernel_size: 1 });
    this.hardsigmoid = new torch.nn.Hardsigmoid();  // Hardware-friendly
    this.fc2 = new torch.nn.Conv2d(channels / 16, channels, { kernel_size: 1 });
  }

  forward(x: torch.Tensor): torch.Tensor {
    // Squeeze: global average pooling
    se = this.avg_pool.forward(x);
    // Excitation: FC-gate-FC
    se = this.fc1.forward(se);
    se = torch.nn.functional.relu(se);
    se = this.fc2.forward(se);
    se = this.hardsigmoid.forward(se);  // Efficient gating [0, 1]
    // Scale
    return x.mul(se);
  }
}

// Comparing Hardsigmoid vs Sigmoid
const x = torch.linspace(-5, 5, [1000]);
const sigmoid = new torch.nn.Sigmoid();
const hardsigmoid = new torch.nn.Hardsigmoid();

const y_sigmoid = sigmoid.forward(x);          // Smooth exponential curve
const y_hardsigmoid = hardsigmoid.forward(x);  // Piecewise linear approximation

// Hardsigmoid ≈ Sigmoid in [-3, 3] range, but much faster to compute
// For mobile/edge, Hardsigmoid is preferred; for server GPUs, Sigmoid is typical

torch.nn.Hardsigmoid

Examples

See Also

torch.nn.Hardsigmoid

Examples

See Also