torch.nn.HardshrinkOptions

Hardshrink activation function (Hard thresholding).

Hardshrink is a hard thresholding activation that zeros out small magnitude activations and preserves large ones. It applies the function Hardshrink(x) = x if |x| > λ, else 0. This is a form of structured sparsity that can promote feature selection and interpretability. Hardshrink is rarely used in standard deep learning (ReLU and variants are simpler and more effective), but appears in sparse representation learning, autoencoders, and wavelet-based architectures.

Core idea: Hardshrink(x) = x if |x| > λ, else 0. All activations with magnitude below threshold λ are zero'd out; all others pass through unchanged. This hard thresholding creates discrete sparse activations. Unlike soft thresholding (Softshrink) which shrinks large values, hard thresholding preserves them.

When to use Hardshrink:

Sparse representation learning (when you want sparse feature selection)
Autoencoders with sparsity constraints
Denoising networks (threshold for noise removal)
Wavelet-based neural networks
Rarely: standard deep networks use ReLU/variants; sparsity usually via regularization

Trade-offs vs Softshrink:

Sparsity: Hard thresholding creates more binary sparsity (exactly zero vs shrunk)
Preservation: Hardshrink preserves large values unchanged vs Softshrink shrinks them
Smoothness: Hardshrink has kink at ±λ (discontinuous gradient) vs Softshrink's smooth
Interpretability: Hardshrink's binary sparsity easier to interpret than soft shrinkage
Training: Gradient zero at small values, normal at large (sparse updates)

Trade-offs vs ReLU:

Sparsity pattern: ReLU zeros negatives; Hardshrink zeros small magnitudes (different)
Interpretation: Hardshrink threshold λ vs ReLU's implicit zero threshold
Efficiency: ReLU standard and highly optimized; Hardshrink rarely used
Empirical: ReLU better for general deep learning; Hardshrink for specific sparse tasks

Algorithm: Forward: Hardshrink(x) = x if |x| > λ, else 0 (hard thresholding at ±λ) Backward: ∂/∂x = 1 if |x| > λ, else 0 (zero gradient for small values, normal for large) The hard threshold creates sparse gradients, only large activations get updates.

Definition

export interface HardshrinkOptions {
  /** Threshold value for shrinkage (default: 0.5) */
  lambd?: number;
}

lambd(number)optional: – Threshold value for shrinkage (default: 0.5)

Examples

// Sparse representation learning with hard thresholding
class SparseAutoencoder extends torch.nn.Module {
  private encode1: torch.nn.Linear;
  private hardshrink: torch.nn.Hardshrink;
  private encode2: torch.nn.Linear;
  private decode: torch.nn.Linear;

  constructor() {
    super();
    this.encode1 = new torch.nn.Linear(784, 256);
    this.hardshrink = new torch.nn.Hardshrink(0.1);  // λ = 0.1, threshold small values
    this.encode2 = new torch.nn.Linear(256, 64);     // Sparse features
    this.decode = new torch.nn.Linear(64, 784);
  }

  forward(x: torch.Tensor): torch.Tensor {
    x = this.encode1.forward(x);
    x = this.hardshrink.forward(x);  // Hard threshold for sparsity
    x = this.encode2.forward(x);
    return this.decode.forward(x);
  }
}

// Denoising with hard thresholding
const noisy_data = torch.randn([100, 50]).add(torch.randn([100, 50]).mul(0.5));
const hardshrink = new torch.nn.Hardshrink(0.5);  // Remove small values
const denoised = hardshrink.forward(noisy_data);  // Hard threshold removes noise