torch.nn.HardshrinkOptions
Hardshrink activation function (Hard thresholding).
Hardshrink is a hard thresholding activation that zeros out small magnitude activations and preserves large ones. It applies the function Hardshrink(x) = x if |x| > λ, else 0. This is a form of structured sparsity that can promote feature selection and interpretability. Hardshrink is rarely used in standard deep learning (ReLU and variants are simpler and more effective), but appears in sparse representation learning, autoencoders, and wavelet-based architectures.
Core idea: Hardshrink(x) = x if |x| > λ, else 0. All activations with magnitude below threshold λ are zero'd out; all others pass through unchanged. This hard thresholding creates discrete sparse activations. Unlike soft thresholding (Softshrink) which shrinks large values, hard thresholding preserves them.
When to use Hardshrink:
- Sparse representation learning (when you want sparse feature selection)
- Autoencoders with sparsity constraints
- Denoising networks (threshold for noise removal)
- Wavelet-based neural networks
- Rarely: standard deep networks use ReLU/variants; sparsity usually via regularization
Trade-offs vs Softshrink:
- Sparsity: Hard thresholding creates more binary sparsity (exactly zero vs shrunk)
- Preservation: Hardshrink preserves large values unchanged vs Softshrink shrinks them
- Smoothness: Hardshrink has kink at ±λ (discontinuous gradient) vs Softshrink's smooth
- Interpretability: Hardshrink's binary sparsity easier to interpret than soft shrinkage
- Training: Gradient zero at small values, normal at large (sparse updates)
Trade-offs vs ReLU:
- Sparsity pattern: ReLU zeros negatives; Hardshrink zeros small magnitudes (different)
- Interpretation: Hardshrink threshold λ vs ReLU's implicit zero threshold
- Efficiency: ReLU standard and highly optimized; Hardshrink rarely used
- Empirical: ReLU better for general deep learning; Hardshrink for specific sparse tasks
Algorithm: Forward: Hardshrink(x) = x if |x| > λ, else 0 (hard thresholding at ±λ) Backward: ∂/∂x = 1 if |x| > λ, else 0 (zero gradient for small values, normal for large) The hard threshold creates sparse gradients, only large activations get updates.
Definition
export interface HardshrinkOptions {
/** Threshold value for shrinkage (default: 0.5) */
lambd?: number;
}lambd(number)optional- – Threshold value for shrinkage (default: 0.5)
Examples
// Sparse representation learning with hard thresholding
class SparseAutoencoder extends torch.nn.Module {
private encode1: torch.nn.Linear;
private hardshrink: torch.nn.Hardshrink;
private encode2: torch.nn.Linear;
private decode: torch.nn.Linear;
constructor() {
super();
this.encode1 = new torch.nn.Linear(784, 256);
this.hardshrink = new torch.nn.Hardshrink(0.1); // λ = 0.1, threshold small values
this.encode2 = new torch.nn.Linear(256, 64); // Sparse features
this.decode = new torch.nn.Linear(64, 784);
}
forward(x: torch.Tensor): torch.Tensor {
x = this.encode1.forward(x);
x = this.hardshrink.forward(x); // Hard threshold for sparsity
x = this.encode2.forward(x);
return this.decode.forward(x);
}
}// Denoising with hard thresholding
const noisy_data = torch.randn([100, 50]).add(torch.randn([100, 50]).mul(0.5));
const hardshrink = new torch.nn.Hardshrink(0.5); // Remove small values
const denoised = hardshrink.forward(noisy_data); // Hard threshold removes noise