torch.nn.Hardsigmoid
class Hardsigmoid extends Modulenew Hardsigmoid(options?: ActivationOptions)
Hardsigmoid activation function (Hardware-friendly sigmoid approximation).
Hardsigmoid is a piecewise linear approximation of Sigmoid designed for efficient computation on mobile/edge devices. Instead of computing sigmoid(x) = 1 / (1 + exp(-x)) which requires expensive exp(), Hardsigmoid uses a simple piecewise linear function. The constants 3 and 6 are chosen so that the linear approximation matches sigmoid well in the active range [-3, 3]. It was introduced in MobileNetV2 for efficient gating mechanisms.
Core idea: Hardsigmoid(x) = clamp((x + 3) / 6, 0, 1) = {0 if x ≤ -3, (x+3)/6 if -3 < x < 3, 1 if x ≥ 3} This is a piecewise linear approximation of sigmoid where the slope (1/6) ensures the output range matches. Unlike Sigmoid's smooth exponential curve, Hardsigmoid uses just comparisons and arithmetic.
When to use Hardsigmoid:
- Mobile networks: MobileNetV2+ for efficient gating in squeeze-excitation blocks
- Edge/embedded devices: Integer-arithmetic friendly (no exp needed)
- Quantization: Designed for low-precision quantized deployment
- Gate mechanisms: Where Sigmoid gating is too expensive
- Drop-in replacement: For Sigmoid in mobile models with minimal quality loss
Trade-offs vs Sigmoid:
- Compute efficiency: Piecewise linear (simple comparisons) vs sigmoid's expensive exp()
- Approximation quality: Approximates sigmoid well in [-3, 3]; exact match at boundaries
- Integer-friendly: Can be computed entirely with integer arithmetic
- Empirical quality: In full precision, Sigmoid slightly better; with quantization, Hardsigmoid comparable
- Quantization-friendly: Bounded output [0, 1], designed for int8 deployment
- Hardware benefit: On mobile with integer-only arithmetic, major speedup; on GPU, negligible
Algorithm: Forward: Hardsigmoid(x) = clamp((x + 3) / 6, 0, 1) Piecewise: 0 if x < -3, (x+3)/6 if -3 ≤ x ≤ 3, 1 if x > 3 Backward: Piecewise: 0 if x < -3, 1/6 if -3 ≤ x ≤ 3, 0 if x > 3 The linear approximation makes gradient computation trivial (just 1/6 in the active range)
- Mobile standard: Efficient gating in MobileNetV2, MobileNetV3 SE blocks.
- Piecewise linear: Approximates sigmoid with simple piecewise linear function.
- Integer-friendly: Can be computed with integer arithmetic (no exp).
- Approximation range: Linear match to sigmoid in [-3, 3]; exact boundaries at -3, 3.
- Quantization designed: Bounded [0, 1] output for quantized int8 deployment.
- Server less common: On server hardware with fast float ops, Sigmoid usually preferred.
Examples
// Squeeze-Excitation block with Hardsigmoid (MobileNetV2 style)
class SEBlock extends torch.nn.Module {
private avg_pool: torch.nn.AdaptiveAvgPool2d;
private fc1: torch.nn.Conv2d;
private hardsigmoid: torch.nn.Hardsigmoid; // Efficient gating
private fc2: torch.nn.Conv2d;
constructor(channels: number) {
super();
this.avg_pool = new torch.nn.AdaptiveAvgPool2d([1, 1]);
this.fc1 = new torch.nn.Conv2d(channels, channels / 16, { kernel_size: 1 });
this.hardsigmoid = new torch.nn.Hardsigmoid(); // Hardware-friendly
this.fc2 = new torch.nn.Conv2d(channels / 16, channels, { kernel_size: 1 });
}
forward(x: torch.Tensor): torch.Tensor {
// Squeeze: global average pooling
se = this.avg_pool.forward(x);
// Excitation: FC-gate-FC
se = this.fc1.forward(se);
se = torch.nn.functional.relu(se);
se = this.fc2.forward(se);
se = this.hardsigmoid.forward(se); // Efficient gating [0, 1]
// Scale
return x.mul(se);
}
}// Comparing Hardsigmoid vs Sigmoid
const x = torch.linspace(-5, 5, [1000]);
const sigmoid = new torch.nn.Sigmoid();
const hardsigmoid = new torch.nn.Hardsigmoid();
const y_sigmoid = sigmoid.forward(x); // Smooth exponential curve
const y_hardsigmoid = hardsigmoid.forward(x); // Piecewise linear approximation
// Hardsigmoid ≈ Sigmoid in [-3, 3] range, but much faster to compute
// For mobile/edge, Hardsigmoid is preferred; for server GPUs, Sigmoid is typical