torch.nn.functional.cosine_similarity
function cosine_similarity(x1: Tensor, x2: Tensor, options?: CosineSimilarityFunctionalOptions): Tensorfunction cosine_similarity(x1: Tensor, x2: Tensor, dim: number, eps: number, options?: CosineSimilarityFunctionalOptions): TensorCosine Similarity: measures angular distance between vectors, invariant to magnitude.
Computes cosine similarity between pairs of vectors along a specified dimension. Returns values in [-1, 1] where 1 = identical direction, 0 = orthogonal, -1 = opposite direction. Unlike Euclidean distance, cosine similarity only measures angle between vectors, ignoring magnitude. Essential for:
- Semantic similarity (word embeddings, text similarity)
- Similarity-based retrieval and clustering
- Metric learning and contrastive learning
- Recommendation systems (content similarity)
- Document similarity and topic modeling
- Direction-based matching (ignoring scale)
- Angular distance metrics for normalized embeddings
Key properties:
- Range: [-1, 1] (1 = same direction, 0 = orthogonal, -1 = opposite)
- Scale invariant: cosine_similarity(x, y) = cosine_similarity(2x, y)
- Geometric: measures angle between vectors in vector space
- Symmetric: cosine_similarity(x, y) = cosine_similarity(y, x)
When to use Cosine Similarity:
- Comparing embeddings (text, images, documents)
- Similarity-based losses (contrastive, triplet loss)
- Recommendation systems (user-item, item-item similarity)
- Clustering with direction rather than magnitude
- Semantic search (finding similar sentences/documents)
Comparison with alternatives:
- Euclidean distance: Magnitude-sensitive; cosine is scale-invariant
- Dot product: Magnitude-dependent; cosine normalizes by norms
- Pearson correlation: Similar but for centered data; cosine for any data
- Jaccard similarity: For sets; cosine for continuous vectors
- Scale invariant: Similarity independent of vector magnitudes (directions matter)
- Range [-1, 1]: Normalized output easy to interpret and use in loss functions
- Symmetric: cosine_similarity(x, y) = cosine_similarity(y, x)
- Geometric meaning: Actually cos(angle) where angle is between vectors
- Efficient computation: Single dot product + norms (faster than Euclidean)
- Dimension handling: Reduces dimension where similarity is computed
- Numerically stable: eps parameter prevents division by zero
- Requires normalized vectors for best results: Works better when inputs are L2-normalized
- Not a distance metric: Strictly, not a metric (triangle inequality violated)
- Handle zero vectors: eps prevents NaN when vectors are zero-length
- Sign interpretation: Negative values indicate opposite directions (rare with embeddings)
Parameters
x1Tensor- First tensor for comparison (any shape with dim dimension)
x2Tensor- Second tensor for comparison (same shape as x1)
optionsCosineSimilarityFunctionalOptionsoptional
Returns
Tensor– Tensor of similarities, shape = x1.shape without dim dimension, values in [-1, 1]Examples
// Basic cosine similarity between vectors
const x1 = torch.tensor([[1, 0, 0], [1, 1, 1]]); // 2 vectors in 3D
const x2 = torch.tensor([[1, 0, 0], [0, 0, 1]]); // 2 vectors in 3D
const sim = torch.nn.functional.cosine_similarity(x1, x2, 1);
// sim ≈ [1, 0.33] (first vectors identical, second orthogonal-ish)
// Semantic similarity: word embeddings
const embedding1 = torch.randn(batch, embedding_dim); // "cat" embeddings
const embedding2 = torch.randn(batch, embedding_dim); // "dog" embeddings
const similarity = torch.nn.functional.cosine_similarity(embedding1, embedding2, 1);
// similarity[i] = how similar "cat" and "dog" embeddings are (scale-invariant)
// Contrastive loss: minimize distance to positives, maximize to negatives
const query = torch.randn(batch, dim); // Query embedding
const positive = torch.randn(batch, dim); // Similar item
const negative = torch.randn(batch, dim); // Dissimilar item
const pos_sim = torch.nn.functional.cosine_similarity(query, positive, 1); // Should be high
const neg_sim = torch.nn.functional.cosine_similarity(query, negative, 1); // Should be low
const loss = torch.nn.functional.relu(neg_sim - pos_sim + margin); // Contrastive loss
// Recommendation system: user-item similarity
const user_embedding = torch.randn(1, embedding_dim); // One user
const item_embeddings = torch.randn(num_items, embedding_dim); // All items
const scores = torch.nn.functional.cosine_similarity(
user_embedding.expand(num_items, -1),
item_embeddings,
1
);
// scores[i] = similarity of user to item i (for ranking)
// Scale invariance demonstration
const x = torch.tensor([3, 4]); // Length = 5
const y = torch.tensor([1, 0]); // Length = 1
const sim1 = torch.nn.functional.cosine_similarity(x, y, 0);
const sim2 = torch.nn.functional.cosine_similarity(x.mul(10), y, 0); // Scale x by 10
// sim1 == sim2 (scale doesn't matter for cosine similarity)See Also
- PyTorch torch.nn.functional.cosine_similarity
- torch.nn.functional.pdist - All-pairs distances for one batch
- torch.nn.functional.cdist - Distances between two batches
- torch.nn.CosineSimilarity - Module wrapper for cosine similarity
- torch.norm - Compute norms (basis for similarity calculation)