torch.nn.functional.normalize
function normalize(input: Tensor, options?: NormalizeFunctionalOptions): Tensorfunction normalize(input: Tensor, p: number, dim: number, eps: number, options?: NormalizeFunctionalOptions): TensorLp Normalization: scales vectors to unit Lp norm along a dimension.
Normalizes input tensors along a specified dimension to have unit Lp norm. Divides vectors by their Lp norm to scale them to magnitude 1 while preserving direction. Essential for:
- Embedding normalization (L2 normalization for cosine similarity)
- Batch normalization alternatives (deterministic normalization)
- Feature scaling (zero mean/unit variance as alternative to batch norm)
- Regularization (constrains weight/parameter magnitudes)
- Metric learning (normalizing embeddings for angular distances)
- Model interpretability (analyzing normalized feature representations)
- Neural network stability (prevents activation explosion)
Lp Norm Variants:
- p=1: L1 normalization (Manhattan norm, sum of absolute values)
- p=2: L2 normalization (Euclidean norm, most common)
- p=∞: L∞ normalization (max absolute value)
- p∈ℝ⁺: General Lp norm for any p > 0
When to use Normalize:
- Embedding spaces (L2 normalization for cosine similarity)
- Without batch normalization (explicit deterministic normalization)
- Feature scaling (constraint weights/activations to unit norm)
- Metric learning and embedding-based models
- Stabilizing neural network training
- Regularization via magnitude constraints
Comparison with alternatives:
- Batch Norm: Learns affine transformation; normalize is deterministic scaling
- Layer Norm: Normalizes per sample; normalize is dimension-specific
- Weight Norm: Constrains weight matrices; normalize can apply to any tensor
- Clamp: Bounds values; normalize preserves direction, scales to unit norm
- Preserves direction: Normalization only changes magnitude, not direction
- Unit norm result: Output has norm close to 1 (up to eps for numerical stability)
- L2 most common: Default p=2 for Euclidean normalization (most used case)
- Dimension-specific: Applies independently to vectors along specified dimension
- Deterministic: No learnable parameters, purely geometric scaling
- Efficient: Simple computation (norm + division)
- Numerical stable: eps prevents division by zero
- Zero vectors: eps handles numerical stability but results still close to zero
- Different from batch norm: Deterministic, not learned; no affine parameters
- Dimension matters: Choice of dim significantly changes result
- Not standardization: Doesn't subtract mean, only normalizes norm
Parameters
inputTensor- Tensor to normalize (any shape)
optionsNormalizeFunctionalOptionsoptional
Returns
Tensor– Normalized tensor, same shape as input, with unit Lp norm along dimExamples
// Basic L2 normalization - scale to unit Euclidean norm
const x = torch.tensor([[3, 4], [1, 0]]); // Vectors [3,4] and [1,0]
const normalized = torch.nn.functional.normalize(x, 2, 1);
// normalized ≈ [[0.6, 0.8], [1, 0]] (unit L2 norm)
// Embedding normalization for cosine similarity
const embeddings = torch.randn(batch_size, embedding_dim); // Raw embeddings
const normalized_emb = torch.nn.functional.normalize(embeddings, 2, 1); // L2 normalize
// Now cosine_similarity = dot product (since ||x|| = ||y|| = 1)
// Feature scaling without batch norm
const features = torch.randn(batch, num_features);
const scaled = torch.nn.functional.normalize(features, 2, 1); // L2 normalize per sample
// Deterministic scaling (no batch statistics needed)
// Different norm types
const x = torch.tensor([[1, 2, 3]]);
const l1_norm = torch.nn.functional.normalize(x, 1, 1); // L1: sum=6
const l2_norm = torch.nn.functional.normalize(x, 2, 1); // L2: sqrt(14)
const linf_norm = torch.nn.functional.normalize(x, Number.POSITIVE_INFINITY, 1); // L∞: max=3
// Different normalizations, each with unit norm
// Regularization: constrain network weights
const weights = torch.randn(out_features, in_features);
const normalized_w = torch.nn.functional.normalize(weights, 2, 1);
// Each output feature's weights have unit norm (weight regularization)
// Normalize along different dimensions
const x = torch.randn(batch, channels, height, width); // 4D image data
const norm_spatial = torch.nn.functional.normalize(x, 2, 1); // Normalize per channel
const norm_channel = torch.nn.functional.normalize(x, 2, [1, 2, 3]); // Normalize spatial
// Different semantic meanings depending on dimensionSee Also
- PyTorch torch.nn.functional.normalize
- torch.nn.functional.batch_norm - Learnable normalization with batch statistics
- torch.nn.functional.layer_norm - Per-sample normalization
- torch.nn.functional.cosine_similarity - Uses normalized vectors for similarity
- torch.norm - Compute norm without normalization