torch.nn.SoftminOptions
Softmin activation function.
Softmin is the inverse/dual of softmax: it converts logits into a probability distribution where smaller values get higher probabilities (opposite of softmax). Mathematically, Softmin(x) = Softmax(-x). Softmin is rarely used in practice (softmax is the standard for classification), but appears in: optimization problems where you want to emphasize minimum values, finding the "softest" minimum, or theoretical work on probability distributions. The main practical use is in attention mechanisms or ranking scenarios where you want to attend to smaller/lower values instead of larger ones.
Core idea: Softmin converts values to probabilities with an inverse relationship: smaller inputs get higher probabilities. It's mathematically defined as Softmin(x_i) = exp(-x_i) / Σ_j exp(-x_j). This is equivalent to softmax of the negated inputs. Like softmax, outputs sum to 1 and are in (0, 1).
When to use Softmin:
- Minimize selection: When you want to focus on smaller values (opposite of softmax)
- Cost/loss weighting: Emphasis smaller losses in weighted average (reinforcement learning)
- Inverse ranking: Probability proportional to how small/good (low-cost) something is
- NOT for classification: Standard softmax is the de-facto standard for classification
- Research/theory: Appears in theoretical work on distributions and optimization
Softmin vs Softmax:
- Softmax: exp(x_i) / Σ_j exp(x_j) - larger values get higher probability
- Softmin: exp(-x_i) / Σ_j exp(-x_j) - smaller values get higher probability
- Both output normalized probabilities in (0, 1) that sum to 1
- Both are smooth and differentiable everywhere
Algorithm: Forward: Softmin(x)_i = exp(-x_i) / Σ_j exp(-x_j) Using log-sum-exp trick: log(Σ_j exp(-x_j)) = -min(x) + log(Σ_j exp(-x_j - (-min(x)))) For numerical stability, subtract min(x) from all values before exp.
Backward: ∂Softmin_i/∂x_j = -Softmin_i * (δ_ij - Softmin_j) (negative of softmax gradient) The negative sign reflects the inverse relationship with softmax.
Definition
export interface SoftminOptions {
/** Dimension along which softmin is computed (default: -1) */
dim?: number;
}dim(number)optional- – Dimension along which softmin is computed (default: -1)
Examples
// Cost-based selection: lower cost = higher probability
const costs = torch.tensor([[5.0, 2.0, 8.0], [1.0, 3.0, 2.0]]);
const softmin = new torch.nn.Softmin(-1); // Along class dimension
const probabilities = softmin.forward(costs);
// probabilities[0] ≈ [small, large, tiny] (cost=2 gets highest prob)
// probabilities[1] ≈ [large, small, small] (cost=1 gets highest prob)
// Lower cost → higher probability (inverse of softmax)// Comparison: Softmin vs Softmax on same input
const logits = torch.randn([5, 3]);
const softmax = new torch.nn.Softmax(-1);
const softmin = new torch.nn.Softmin(-1);
const probs_max = softmax.forward(logits); // High values → high probability
const probs_min = softmin.forward(logits); // Low values → high probability
// probs_min is NOT the inverse of probs_max; they're different distributions
// probs_max emphasizes large values, probs_min emphasizes small values// Inverse relationship visualization
const x = torch.tensor([1.0, 2.0, 3.0]);
const softmax = new torch.nn.Softmax(-1);
const softmin = new torch.nn.Softmin(-1);
const probs_max = softmax.forward(x); // [small, medium, large]
const probs_min = softmin.forward(x); // [large, medium, small]
// Softmax: 1→0.09, 2→0.24, 3→0.67 (3 is highest)
// Softmin: 1→0.67, 2→0.24, 3→0.09 (1 is highest, 3 is lowest)