torch.nn.BCELoss

class BCELoss extends Module

new BCELoss(options?: { weight?: Tensor; reduction?: Reduction })

readonlyweight(Tensor | null)
readonlyreduction(Reduction)

Binary Cross Entropy (BCE) Loss: standard loss for binary classification and multi-label classification.

Measures the divergence between predicted probabilities and binary targets (0 or 1). Expects input to already be sigmoid-transformed (values in [0, 1]). Essential for:

Binary classification (predicting one output, two classes)
Multi-label classification (multiple independent binary decisions per sample)
Any task with sigmoid output layer
Pixel-wise classification (semantic segmentation masks)
Per-element binary decisions

Important: This expects sigmoid-transformed probabilities as input. If you have raw logits, use BCEWithLogitsLoss instead (more numerically stable).

When to use BCELoss:

Your model outputs probabilities (after sigmoid)
Binary or multi-label classification tasks
Custom sigmoid application before loss
When you explicitly want to separate sigmoid from loss computation

Trade-offs:

vs BCEWithLogitsLoss: Raw logits are more numerically stable; use that if possible
vs CrossEntropyLoss: BCE for multi-label (multiple ones per sample); CE for single-label
Numerical stability: BCEWithLogitsLoss is preferred (avoids log(0) issues)
Explicit sigmoid: BCE requires you to sigmoid first; BCEWithLogitsLoss does it internally

Algorithm: For each element in batch:

loss_i = -(target_i * log(pred_i) + (1 - target_i) * log(1 - pred_i))

Numerically unstable with extreme probability values (0 or 1). Use BCEWithLogitsLoss for raw logits to avoid numerical issues.

\begin{aligned} L_i = -(\text{target}_i \log(\text{input}_i) + (1 - \text{target}_i) \log(1 - \text{input}_i)) \\ \text{With weights: } L_i = w_i \cdot L_i \end{aligned}

Input requirements: Input must be probabilities in (0, 1), NOT raw logits
Use BCEWithLogitsLoss: If you have raw logits (more numerically stable)
Multi-label vs multi-class: BCE for multiple binary decisions, CrossEntropy for single class
Numerical stability: Avoid extreme values (0 or 1), use small epsilon if needed
Gradient behavior: Larger loss when very far from target, good for learning
Common pattern: FC layer → sigmoid → BCE loss
Weight usage: Can weight individual samples for importance sampling
Computational: O(batch_size × num_elements) - efficient

Examples

// Binary classification with sigmoid output
const bce = new torch.nn.BCELoss();

// Predicted probabilities (must be in [0, 1])
const predictions = torch.sigmoid(torch.randn([32, 1]));

// Binary targets (0 or 1)
const targets = torch.randint(0, 2, [32, 1]);

// Compute loss
const loss = bce.forward(predictions, targets);

// Multi-label classification: predicting multiple independent labels
class MultiLabelClassifier extends torch.nn.Module {
  fc1: torch.nn.Linear;
  fc2: torch.nn.Linear;
  sigmoid: torch.nn.Sigmoid;

  constructor(input_dim: number, num_labels: number) {
    super();
    this.fc1 = new torch.nn.Linear(input_dim, 128);
    this.fc2 = new torch.nn.Linear(128, num_labels);
    this.sigmoid = new torch.nn.Sigmoid();
  }

  forward(x: torch.Tensor): torch.Tensor {
    x = this.fc1.forward(x);
    x = torch.nn.functional.relu(x);
    x = this.fc2.forward(x);       // Raw logits
    x = this.sigmoid.forward(x);   // Convert to probabilities
    return x;                       // [batch, num_labels] in (0, 1)
  }
}

const model = new MultiLabelClassifier(100, 5);  // 5 independent labels
const bce = new torch.nn.BCELoss();

const batch_x = torch.randn([32, 100]);
const batch_y = torch.randint(0, 2, [32, 5]);  // Multi-hot encoded
const probs = model.forward(batch_x);
const loss = bce.forward(probs, batch_y);  // Each label is independent

torch.nn.BCELoss

Examples

See Also

torch.nn.BCELoss

Examples

See Also