torch.nn.functional.soft_margin_loss
function soft_margin_loss(input: Tensor, target: Tensor, options?: SoftMarginLossFunctionalOptions): TensorSoft margin loss for binary classification with logistic regression.
Computes the logistic loss (cross-entropy for ±1 targets) for binary classification. Given raw scores x and labels y ∈ {+1, -1}, minimizes log(1 + exp(-y·x)). Smooth approximation to hinge loss that's differentiable everywhere. Essential for:
- Binary classification with margin-like objectives (smoother than hard margin)
- Logistic regression generalized to ±1 labels (vs standard 0/1)
- Two-class support vector machines (SVM) objectives
- Learning with confidence margins (soft instead of hard margin)
- Smooth ranking objectives (alternative to hard hinge loss)
Core idea: Minimize log(1 + exp(-y·x)), a smooth version of max(0, 1 - y·x).
- When y·x is large and positive: loss ≈ 0 (correct prediction with high confidence)
- When y·x near 0: loss ≈ log(2) (uncertain)
- When y·x is large and negative: loss ≈ |y·x| (misclassified, penalty grows linearly)
Why "soft"? Smooth sigmoid-like loss that approaches hinge loss asymptotically. Unlike hard-margin hinge loss max(0, 1 - y·x), soft margin is differentiable everywhere.
Connection to logistic regression: Standard logistic: L = log(1 + exp(-y·x)) where y ∈ {0, 1} Soft margin: L = log(1 + exp(-y·x)) where y ∈ {+1, -1} (just a label convention change)
- Smooth approximation: Approximates hard-margin hinge loss with smooth differentiable function
- Logistic loss: Equivalent to binary cross-entropy with ±1 labels (vs 0/1)
- Always positive: Loss is always ≥ log(2) ≈ 0.693 (even for correct predictions)
- Asymptotic: As |y·x| → ∞, loss behaves like |y·x| (similar to hinge loss)
- Symmetric: Loss is symmetric in ±1 labels (just negates x)
- Label convention: Must use +1/-1 labels, not 0/1 (will give wrong results otherwise)
- Unbounded loss: For very incorrect predictions, loss grows without bound
- Scale sensitivity: Loss depends on absolute scale of logits; very large/small values matter
- Numerical stability: Very large negative y·x can cause exp overflow; usually handled internally
Parameters
inputTensor- Raw classification scores (logits), shape [batch_size] or [...]. Typically output of linear classifier without sigmoid/tanh applied. Larger x·y → smaller loss (more confident correct prediction).
targetTensor- Binary classification labels, shape [...] matching input. Values must be +1 (positive class) or -1 (negative class).
optionsSoftMarginLossFunctionalOptionsoptional
Returns
Tensor– Loss tensor, shape [] (scalar) if reduction='mean'|'sum', else [...]Examples
// Binary classification: logits and ±1 labels
const batch_size = 32;
const logits = torch.randn([batch_size]); // Raw classification scores
const labels = torch.ones([batch_size]); // Binary labels: +1 or -1
labels.masked_fill_(torch.rand([batch_size]).lt(0.5), -1); // Random labels
const loss = torch.nn.functional.soft_margin_loss(logits, labels);// Logistic regression for binary classification
const X = torch.randn([100, 20]); // Features [batch_size, feature_dim]
const W = torch.randn([20, 1]); // Weights [feature_dim, 1]
const y = torch.ones([100]); // Labels: +1 or -1
y.masked_fill_(torch.rand([100]).lt(0.3), -1);
const logits = X.matmul(W).squeeze(-1); // [100]
const loss = torch.nn.functional.soft_margin_loss(logits, y);// Per-sample loss for custom weighting by difficulty
const logits = torch.randn([32]);
const labels = torch.ones([32]);
labels.masked_fill_(torch.rand([32]).lt(0.5), -1);
const per_sample_loss = torch.nn.functional.soft_margin_loss(logits, labels, 'none'); // [32]
const loss_weights = per_sample_loss.gt(Math.log(2)).float().mul(2).add(1); // Hard examples weighted higher
const weighted_loss = per_sample_loss.mul(loss_weights).mean();See Also
- PyTorch torch.nn.functional.soft_margin_loss
- torch.nn.functional.binary_cross_entropy - Standard BCE loss (different target convention)
- torch.nn.functional.hinge_embedding_loss - Hard-margin variant
- torch.nn.functional.margin_ranking_loss - Margin loss for pairwise ranking
- torch.nn.functional.cross_entropy - Multi-class alternative