torch.Tensor.Tensor.threshold
Threshold activation function with hard cutoff.
Element-wise hard thresholding: returns input if greater than threshold, otherwise returns the replacement value. Unlike soft thresholding, this is a hard cutoff with no gradient region below the threshold (except at the threshold point itself).
Definition: Threshold(x) = x if x > threshold else value
- Creates a hard boundary at the threshold
- Completely removes (replaces) values below threshold
- Zero gradient for x ≤ threshold (except spike at boundary)
- Preserves gradient for x > threshold
Use Cases:
- Hard gating/decision making (keep or remove)
- Extreme sparsification (binary keep/discard)
- Trimming small activations (removing noise)
- Hard attention mechanisms (all-or-nothing selection)
- Numerical stability (clipping extreme values)
- Hard cutoff: Unlike soft thresholding, completely removes values below threshold.
- Zero gradient region: No gradients flow for x ≤ threshold.
- Sparsity: Creates extremely sparse outputs (many exact zeros).
- NaN handling: Preserves NaN values (NaN threshold is always false).
- Replacement value: Typically 0, but can be any value.
- Zero gradient below threshold may cause dead neurons.
- Hard cutoff not differentiable at threshold point.
- May cause gradient issues in backpropagation.
- Only for special architectures (not standard hidden layers).
Parameters
thresholdnumber- The cutoff value. Input threshold is kept.
valuenumber- Replacement value for input ≤ threshold (typically 0)
Returns
Tensor<S, D, Dev>– Tensor with same shape as inputExamples
// Basic thresholding - keep or discard
const x = torch.tensor([-2, -1, 0, 1, 2, 3, 4]);
x.threshold(1.5, 0); // [0, 0, 0, 0, 0, 3, 4]
// Hard gating - all or nothing
const scores = torch.randn(32, 10);
const gated = scores.threshold(0.5, 0); // Keep high scores, zero others
// Sparsification - remove small values
const weights = torch.randn(100, 100);
const sparse = weights.threshold(0.1, 0); // Prune small weights
// Different replacement values
const x = torch.tensor([-2, -1, 0, 1, 2]);
x.threshold(0, -1); // [-1, -1, -1, 1, 2] (negative values → -1)
x.threshold(0, 0); // [0, 0, 0, 1, 2] (standard: zero below)
// Extreme sparsification in attention
const attention = torch.randn(8, 16, 16);
const sparse_attention = attention.threshold(0.9, 0); // Only strongest connections
// Feature selection - keep only important features
const features = model.extract_features(image);
const important = features.threshold(features.mean().item(), 0);See Also
- PyTorch torch.nn.Threshold()
- relu - Threshold with threshold=0, value=0
- tanhshrink - Soft thresholding alternative
- clamp - Clipping to range instead of hard cutoff
- masked_select - Conditional selection