torch.nn.functional.cosine_embedding_loss
function cosine_embedding_loss(input1: Tensor, input2: Tensor, target: Tensor): Tensorfunction cosine_embedding_loss(input1: Tensor, input2: Tensor, target: Tensor, margin: number, size_average: boolean | null, reduce: boolean | null, reduction: 'none' | 'mean' | 'sum', options: CosineEmbeddingLossFunctionalOptions): TensorCosine Embedding Loss: learns similarity relationships using angular distance.
Measures whether two inputs should be similar or dissimilar using cosine similarity. Directly optimizes angular distance between embedding pairs. Essential for:
- Siamese networks and metric learning (learn discriminative embeddings)
- Face recognition and verification (embeddings of same person vs different people)
- Person re-identification (matching pedestrians across camera views)
- Image retrieval and content-based similarity search
- One-shot/few-shot learning (learn from minimal examples)
- Contrastive learning and self-supervised pre-training
- Sentence embeddings and semantic similarity (NLP)
How cosine embedding loss works: For similar pairs (target=1): pulls embeddings closer (minimize 1 - cos_sim) For dissimilar pairs (target=-1): pushes embeddings apart (maximize distance ≥ margin) Margin parameter provides safety region: dissimilar pairs must be at least margin apart.
Why cosine similarity:
- Angular distance is scale-invariant (only direction matters, not magnitude)
- Embeddings naturally lie on hypersphere (normalized)
- Computationally efficient (just dot products)
- Interpretable: similarity ∈ [-1, 1] (1=identical, -1=opposite, 0=orthogonal)
Applications:
- Face verification: Embedding of face A similar to face A' (same person), dissimilar to face B
- Semantic search: Document embeddings used to find similar documents
- Metric learning: Learning representations for distance-based classification
- Recommendation systems: User/item embeddings for similarity-based recommendations
- Angle-based optimization: Directly minimizes angular distance (normalized space)
- Scale invariant: Loss independent of embedding magnitude (only direction matters)
- Requires normalization: Usually normalize embeddings for numerical stability
- Margin interpretation: margin=0.5 means dissimilar must have cos_sim ≤ -0.5 (120° angle)
- Asymmetric: Loss different for similar vs dissimilar pairs (not symmetric)
- Temperature scaling: Can multiply similarity by temperature before loss (not built-in)
- Normalization essential: Embeddings usually normalized; cos_sim computation assumes normalized
- Margin choice matters: Too small → insufficient separation; too large → too strict
- Batch size: Needs balanced similar/dissimilar pairs; imbalance can hurt convergence
- Hard negatives: Performance depends on negative pair quality (mining strategies help)
- Numerical stability: Clamp norm computation to avoid division by zero
Parameters
input1Tensor- First embedding tensor of shape (batch, embedding_dim). Typically normalized. Example: face embedding from CNN encoder [batch, 128]
input2Tensor- Second embedding tensor of same shape as input1 (batch, embedding_dim) Example: another face embedding [batch, 128]
targetTensor- Binary labels tensor [batch] containing 1 or -1 (1=similar, -1=dissimilar) Example: [1, 1, -1, -1] means first two pairs are similar, last two dissimilar
Returns
Tensor– Loss tensor (scalar if reduction='mean', or [batch] if reduction='none')Examples
// Face verification: Siamese network
const face_a = model(image_a); // [batch, 128] embedding
const face_b = model(image_b); // [batch, 128] embedding
const labels = torch.tensor([1, 1, -1, -1]); // Pairs: similar, similar, dissimilar, dissimilar
const loss = torch.nn.functional.cosine_embedding_loss(face_a, face_b, labels, 0.5);
// Similar pairs pushed together, dissimilar pushed apart// Metric learning for one-shot classification
const anchor_embed = encoder(anchor_image); // [1, 256] - reference embedding
const query_embed = encoder(query_image); // [batch, 256] - images to classify
const target = torch.tensor([1, 1, -1, -1, -1, -1]); // Which are same class as anchor
const loss = torch.nn.functional.cosine_embedding_loss(
anchor_embed.expand([batch, 256]),
query_embed,
target,
0.5 // margin=0.5
);// Semantic similarity: sentence embeddings
const sent_embed1 = bert_encoder(sent1); // [batch, 768]
const sent_embed2 = bert_encoder(sent2); // [batch, 768]
// target: 1 if sentences are paraphrases, -1 if unrelated
const sim_labels = torch.tensor([1, 1, 1, -1, -1, -1]);
const loss = torch.nn.functional.cosine_embedding_loss(
sent_embed1, sent_embed2, sim_labels, margin=0.3
);// Contrastive learning: positive/negative pairs from data augmentation
const embed1 = model(augmented_image1); // [batch, 128]
const embed2 = model(augmented_image2); // [batch, 128]
// Positive pairs (same image with different augmentations): target=1
// Negative pairs (different images): target=-1
const target = torch.cat([
torch.ones(pos_batch_size), // Positive pairs
torch.neg(torch.ones(neg_batch_size)) // Negative pairs
]);
const loss = torch.nn.functional.cosine_embedding_loss(embed1, embed2, target, 0.5);// Person re-identification: match pedestrians across views
const person_a_features = conv_model(image_a); // [batch, 2048]
const person_b_features = conv_model(image_b); // [batch, 2048]
// target: 1 if same person in different cameras, -1 if different person
const same_person = torch.tensor([1, 1, 1, -1, -1, -1]);
const reid_loss = torch.nn.functional.cosine_embedding_loss(
person_a_features,
person_b_features,
same_person,
0.5
);See Also
- PyTorch torch.nn.functional.cosine_embedding_loss
- triplet_margin_loss - Loss for triplet (anchor, positive, negative) tuples
- contrastive_loss - Alternative contrastive learning loss (if available)
- margin_ranking_loss - Loss for ranking/ordering pairs