torch.nn.Threshold
new Threshold(threshold: number, value: number, options?: ThresholdOptions)
- readonly
threshold(number) - readonly
value(number)
Threshold activation function.
Threshold applies a conditional replacement: output x if x > threshold, else replace with a fixed value. This is a simple binary decision activation that selectively passes through values above a threshold while replacing below-threshold values. Threshold is rarely used in modern deep learning (ReLU with threshold=0 is more common), but appears in some legacy code and specialized applications like binarization or feature gating.
Core idea: Threshold(x) = x if x > threshold else value. This creates a hard decision boundary, passing through values above the threshold unchanged while replacing smaller values with a fixed constant. Unlike ReLU which zeros out below-threshold values, Threshold can replace them with any value.
When to use Threshold:
- Legacy code: Pre-ReLU era networks (rarely used in new code)
- Feature selection: Binary gating: pass strong signals, suppress weak ones
- Binarization: Threshold + value=1 creates binary feature maps
- Specialized signal processing: Signal detection and gating
- NOT recommended: Use ReLU or variants for standard deep learning
Relationship to similar functions:
- ReLU: Threshold(x, threshold=0, value=0) is exactly ReLU(x)
- Hardshrink: Threshold is one-sided; Hardshrink is two-sided
- If-else logic: More like a gating function than a smooth activation
Algorithm: Forward: Threshold(x, t, v) = x if x > t else v
- Simple element-wise comparison and selection
- No computation, just conditional assignment
- Different from Hardshrink which replaces values close to zero (symmetric)
Backward: ∂Threshold(x)/∂x = 1 if x > threshold, else 0
- Gradient flows only for elements above threshold
- Below-threshold elements get zero gradient (dead zone)
- Rarely used: ReLU (Threshold with threshold=0, value=0) is standard instead.
- One-sided gating: Only threshold values from above (unlike Hardshrink which thresholds both sides).
- Replacement value: Can replace with any constant, not just 0 (unlike ReLU).
- No smoothing: Sharp decision boundary at the threshold (unlike smooth activations).
- Dead zone: Values ≤ threshold get zero gradient and cannot contribute to learning.
Examples
// Simple thresholding: pass strong signals, suppress weak ones
const threshold = new torch.nn.Threshold(0.5, 0); // Like ReLU but with threshold=0.5
const x = torch.randn([10, 20]);
const output = threshold.forward(x);
// output[i, j] = x[i, j] if x[i, j] > 0.5, else 0
// This passes through strong signals while suppressing weak signals// Binary gating: convert to binary feature maps
const threshold = new torch.nn.Threshold(0, 1); // Pass high values as 1
const features = torch.randn([batch_size, channels, height, width]);
const binary_features = threshold.forward(features);
// Result: positive values stay, negative values become 1 (constant)
// Creates binary-like feature maps for gating// Thresholding with custom replacement value
const threshold = new torch.nn.Threshold(2.0, -1.0);
const x = torch.tensor([-2, -1, 0, 1, 2, 3, 4]);
const output = threshold.forward(x);
// output = [-1, -1, -1, -1, -1, 3, 4]
// Only values > 2.0 pass through; others become -1.0