torch.nn.Threshold

class Threshold extends Module

new Threshold(threshold: number, value: number, options?: ThresholdOptions)

readonlythreshold(number)
readonlyvalue(number)

Threshold activation function.

Threshold applies a conditional replacement: output x if x > threshold, else replace with a fixed value. This is a simple binary decision activation that selectively passes through values above a threshold while replacing below-threshold values. Threshold is rarely used in modern deep learning (ReLU with threshold=0 is more common), but appears in some legacy code and specialized applications like binarization or feature gating.

Core idea: Threshold(x) = x if x > threshold else value. This creates a hard decision boundary, passing through values above the threshold unchanged while replacing smaller values with a fixed constant. Unlike ReLU which zeros out below-threshold values, Threshold can replace them with any value.

When to use Threshold:

Legacy code: Pre-ReLU era networks (rarely used in new code)
Feature selection: Binary gating: pass strong signals, suppress weak ones
Binarization: Threshold + value=1 creates binary feature maps
Specialized signal processing: Signal detection and gating
NOT recommended: Use ReLU or variants for standard deep learning

Relationship to similar functions:

ReLU: Threshold(x, threshold=0, value=0) is exactly ReLU(x)
Hardshrink: Threshold is one-sided; Hardshrink is two-sided
If-else logic: More like a gating function than a smooth activation

Algorithm: Forward: Threshold(x, t, v) = x if x > t else v

Simple element-wise comparison and selection
No computation, just conditional assignment
Different from Hardshrink which replaces values close to zero (symmetric)

Backward: ∂Threshold(x)/∂x = 1 if x > threshold, else 0

Gradient flows only for elements above threshold
Below-threshold elements get zero gradient (dead zone)

\begin{aligned} Threshold(x, threshold, value) = x if x > threshold else value \\ Gradient: ∂Threshold(x)/∂x = 1 if x > threshold, else 0 \end{aligned}

Rarely used: ReLU (Threshold with threshold=0, value=0) is standard instead.
One-sided gating: Only threshold values from above (unlike Hardshrink which thresholds both sides).
Replacement value: Can replace with any constant, not just 0 (unlike ReLU).
No smoothing: Sharp decision boundary at the threshold (unlike smooth activations).
Dead zone: Values ≤ threshold get zero gradient and cannot contribute to learning.

Examples

// Simple thresholding: pass strong signals, suppress weak ones
const threshold = new torch.nn.Threshold(0.5, 0);  // Like ReLU but with threshold=0.5

const x = torch.randn([10, 20]);
const output = threshold.forward(x);

// output[i, j] = x[i, j] if x[i, j] > 0.5, else 0
// This passes through strong signals while suppressing weak signals

// Binary gating: convert to binary feature maps
const threshold = new torch.nn.Threshold(0, 1);  // Pass high values as 1

const features = torch.randn([batch_size, channels, height, width]);
const binary_features = threshold.forward(features);

// Result: positive values stay, negative values become 1 (constant)
// Creates binary-like feature maps for gating

// Thresholding with custom replacement value
const threshold = new torch.nn.Threshold(2.0, -1.0);

const x = torch.tensor([-2, -1, 0, 1, 2, 3, 4]);

const output = threshold.forward(x);
// output = [-1, -1, -1, -1, -1, 3, 4]
// Only values > 2.0 pass through; others become -1.0

torch.nn.Threshold

Examples

See Also

torch.nn.Threshold

Examples

See Also