torch.nn.Hardtanh
new Hardtanh(options?: HardtanhOptions)
- readonly
min_val(number) - readonly
max_val(number)
Hardtanh activation function.
Hardtanh applies a hard clipping operation that bounds outputs to a fixed range [min_val, max_val]. It's the hard (piecewise-linear) approximation of tanh that's computationally efficient. Hardtanh is widely used in mobile and quantization-friendly models (MobileNets, SqueezeNet) where bounded activations help with low-precision quantization and reduce memory bandwidth. While not as common in modern large models, it remains the standard activation for mobile neural networks.
Core idea: Hardtanh(x) clamps x to [min_val, max_val]. Unlike smooth activations (Tanh, GELU), this has sharp corners at the boundaries but is much faster to compute. The default range [-1, 1] approximates tanh's output range while using simple clamping.
When to use Hardtanh:
- Mobile networks: MobileNetV2/V3 (MobileNet actually uses ReLU6 which is Hardtanh(0, 6))
- Quantization: Bounded activations aid low-precision fixed-point arithmetic
- Embedded devices: Minimal computation (just clipping, no exp/div like tanh)
- Resource constraints: Low latency, low power consumption, small memory footprint
- NOT for large models: Use GELU/SiLU for modern transformers and vision models
Relationship to other bounded activations:
- Hardtanh(0, 6): Exactly equivalent to ReLU6 (mobile standard)
- Hardtanh(-1, 1): Bounded version of Tanh, simpler to compute
- Sigmoid/Tanh: Smooth versions (slower, less quantization-friendly)
- ReLU: Unbounded (no upper limit), one-sided clipping only
Algorithm: Forward: Hardtanh(x) = min(max(x, min_val), max_val)
- Clamps input to the range [min_val, max_val]
- Very fast: just two comparisons per element
- Default range [-1, 1] approximates tanh's natural range
Backward: ∂Hardtanh(x)/∂x = 1 if min_val < x < max_val, else 0
- Flat gradient (zero) outside the active region [min_val, max_val]
- This means values outside the range get no gradient (no learning from extreme values)
- Mobile standard: Hardtanh(0, 6) is ReLU6, the standard in MobileNets.
- Fast computation: Just clipping, no transcendental functions (exp, div).
- Quantization-friendly: Bounded range aids low-precision fixed-point arithmetic.
- Sharp boundaries: Non-smooth corners at min_val and max_val (unlike Tanh).
- Flat gradients outside: Values outside [min_val, max_val] don't get gradients (dead zone).
Examples
// Mobile network: Hardtanh(0, 6) is ReLU6
class MobileNetBlock extends torch.nn.Module {
private conv: torch.nn.Conv2d;
private hardtanh: torch.nn.Hardtanh; // Or use ReLU6 directly
constructor(in_ch: number, out_ch: number) {
super();
this.conv = new torch.nn.Conv2d(in_ch, out_ch, 3, { padding: 1 });
this.hardtanh = new torch.nn.Hardtanh(0, 6); // Standard mobile activation
}
forward(x: torch.Tensor): torch.Tensor {
x = this.conv.forward(x);
return this.hardtanh.forward(x); // Bounded to [0, 6]
}
}// Quantization-friendly network: Bounded activations aid low-precision math
const x = torch.randn([batch_size, channels, height, width]);
// With Hardtanh: values confined to [-1, 1], easier to quantize to int8
const hardtanh = new torch.nn.Hardtanh(-1, 1);
const bounded = hardtanh.forward(x); // All values in [-1, 1]
// With ReLU: values in [0, +∞), harder to represent with fixed-point int8
const relu = new torch.nn.ReLU();
const unbounded = relu.forward(x); // Can be arbitrarily large// Comparison: Hardtanh vs Tanh (similar but different)
const x = torch.linspace(-3, 3, 100);
const hardtanh = new torch.nn.Hardtanh(-1, 1); // Sharp corners at ±1
const tanh = new torch.nn.Tanh(); // Smooth curve
const hard_output = hardtanh.forward(x); // Step-like: -1 for x<-1, x for -1<x<1, 1 for x>1
const tanh_output = tanh.forward(x); // Smooth: approaches ±1 asymptotically
// Hardtanh is ~10x faster due to simple clamping vs exp computation