torch.nn.BCELoss
new BCELoss(options?: { weight?: Tensor; reduction?: Reduction })
- readonly
weight(Tensor | null) - readonly
reduction(Reduction)
Binary Cross Entropy (BCE) Loss: standard loss for binary classification and multi-label classification.
Measures the divergence between predicted probabilities and binary targets (0 or 1). Expects input to already be sigmoid-transformed (values in [0, 1]). Essential for:
- Binary classification (predicting one output, two classes)
- Multi-label classification (multiple independent binary decisions per sample)
- Any task with sigmoid output layer
- Pixel-wise classification (semantic segmentation masks)
- Per-element binary decisions
Important: This expects sigmoid-transformed probabilities as input. If you have raw logits, use BCEWithLogitsLoss instead (more numerically stable).
When to use BCELoss:
- Your model outputs probabilities (after sigmoid)
- Binary or multi-label classification tasks
- Custom sigmoid application before loss
- When you explicitly want to separate sigmoid from loss computation
Trade-offs:
- vs BCEWithLogitsLoss: Raw logits are more numerically stable; use that if possible
- vs CrossEntropyLoss: BCE for multi-label (multiple ones per sample); CE for single-label
- Numerical stability: BCEWithLogitsLoss is preferred (avoids log(0) issues)
- Explicit sigmoid: BCE requires you to sigmoid first; BCEWithLogitsLoss does it internally
Algorithm: For each element in batch:
- loss_i = -(target_i * log(pred_i) + (1 - target_i) * log(1 - pred_i))
Numerically unstable with extreme probability values (0 or 1). Use BCEWithLogitsLoss for raw logits to avoid numerical issues.
- Input requirements: Input must be probabilities in (0, 1), NOT raw logits
- Use BCEWithLogitsLoss: If you have raw logits (more numerically stable)
- Multi-label vs multi-class: BCE for multiple binary decisions, CrossEntropy for single class
- Numerical stability: Avoid extreme values (0 or 1), use small epsilon if needed
- Gradient behavior: Larger loss when very far from target, good for learning
- Common pattern: FC layer → sigmoid → BCE loss
- Weight usage: Can weight individual samples for importance sampling
- Computational: O(batch_size × num_elements) - efficient
Examples
// Binary classification with sigmoid output
const bce = new torch.nn.BCELoss();
// Predicted probabilities (must be in [0, 1])
const predictions = torch.sigmoid(torch.randn([32, 1]));
// Binary targets (0 or 1)
const targets = torch.randint(0, 2, [32, 1]);
// Compute loss
const loss = bce.forward(predictions, targets);// Multi-label classification: predicting multiple independent labels
class MultiLabelClassifier extends torch.nn.Module {
fc1: torch.nn.Linear;
fc2: torch.nn.Linear;
sigmoid: torch.nn.Sigmoid;
constructor(input_dim: number, num_labels: number) {
super();
this.fc1 = new torch.nn.Linear(input_dim, 128);
this.fc2 = new torch.nn.Linear(128, num_labels);
this.sigmoid = new torch.nn.Sigmoid();
}
forward(x: torch.Tensor): torch.Tensor {
x = this.fc1.forward(x);
x = torch.nn.functional.relu(x);
x = this.fc2.forward(x); // Raw logits
x = this.sigmoid.forward(x); // Convert to probabilities
return x; // [batch, num_labels] in (0, 1)
}
}
const model = new MultiLabelClassifier(100, 5); // 5 independent labels
const bce = new torch.nn.BCELoss();
const batch_x = torch.randn([32, 100]);
const batch_y = torch.randint(0, 2, [32, 5]); // Multi-hot encoded
const probs = model.forward(batch_x);
const loss = bce.forward(probs, batch_y); // Each label is independent