torch.nn.functional.normalize

function normalize(input: Tensor, options?: NormalizeFunctionalOptions): Tensorfunction normalize(input: Tensor, p: number, dim: number, eps: number, options?: NormalizeFunctionalOptions): Tensor

Lp Normalization: scales vectors to unit Lp norm along a dimension.

Normalizes input tensors along a specified dimension to have unit Lp norm. Divides vectors by their Lp norm to scale them to magnitude 1 while preserving direction. Essential for:

Embedding normalization (L2 normalization for cosine similarity)
Batch normalization alternatives (deterministic normalization)
Feature scaling (zero mean/unit variance as alternative to batch norm)
Regularization (constrains weight/parameter magnitudes)
Metric learning (normalizing embeddings for angular distances)
Model interpretability (analyzing normalized feature representations)
Neural network stability (prevents activation explosion)

Lp Norm Variants:

p=1: L1 normalization (Manhattan norm, sum of absolute values)
p=2: L2 normalization (Euclidean norm, most common)
p=∞: L∞ normalization (max absolute value)
p∈ℝ⁺: General Lp norm for any p > 0

When to use Normalize:

Embedding spaces (L2 normalization for cosine similarity)
Without batch normalization (explicit deterministic normalization)
Feature scaling (constraint weights/activations to unit norm)
Metric learning and embedding-based models
Stabilizing neural network training
Regularization via magnitude constraints

Comparison with alternatives:

Batch Norm: Learns affine transformation; normalize is deterministic scaling
Layer Norm: Normalizes per sample; normalize is dimension-specific
Weight Norm: Constrains weight matrices; normalize can apply to any tensor
Clamp: Bounds values; normalize preserves direction, scales to unit norm

\begin{aligned} \\text{Normalize}(x, p, d) = \\frac{x}{\\|x\\|_p + \\epsilon} \\ \\text{where } \\|x\\|_p = \\left(\\sum_i |x_i|^p\\right)^{1/p} \\ \\text{L1 norm: } \\|x\\|_1 = \\sum_i |x_i| \\ \\text{L2 norm: } \\|x\\|_2 = \\sqrt{\\sum_i x_i^2} \\ \\text{L∞ norm: } \\|x\\|_\\infty = \\max_i |x_i| \end{aligned}

Preserves direction: Normalization only changes magnitude, not direction
Unit norm result: Output has norm close to 1 (up to eps for numerical stability)
L2 most common: Default p=2 for Euclidean normalization (most used case)
Dimension-specific: Applies independently to vectors along specified dimension
Deterministic: No learnable parameters, purely geometric scaling
Efficient: Simple computation (norm + division)
Numerical stable: eps prevents division by zero

Zero vectors: eps handles numerical stability but results still close to zero
Different from batch norm: Deterministic, not learned; no affine parameters
Dimension matters: Choice of dim significantly changes result
Not standardization: Doesn't subtract mean, only normalizes norm

Parameters

inputTensor: Tensor to normalize (any shape)
optionsNormalizeFunctionalOptionsoptional

Returns

Tensor– Normalized tensor, same shape as input, with unit Lp norm along dim

Examples

// Basic L2 normalization - scale to unit Euclidean norm
const x = torch.tensor([[3, 4], [1, 0]]);  // Vectors [3,4] and [1,0]
const normalized = torch.nn.functional.normalize(x, 2, 1);
// normalized ≈ [[0.6, 0.8], [1, 0]] (unit L2 norm)

// Embedding normalization for cosine similarity
const embeddings = torch.randn(batch_size, embedding_dim);  // Raw embeddings
const normalized_emb = torch.nn.functional.normalize(embeddings, 2, 1);  // L2 normalize
// Now cosine_similarity = dot product (since ||x|| = ||y|| = 1)

// Feature scaling without batch norm
const features = torch.randn(batch, num_features);
const scaled = torch.nn.functional.normalize(features, 2, 1);  // L2 normalize per sample
// Deterministic scaling (no batch statistics needed)

// Different norm types
const x = torch.tensor([[1, 2, 3]]);
const l1_norm = torch.nn.functional.normalize(x, 1, 1);  // L1: sum=6
const l2_norm = torch.nn.functional.normalize(x, 2, 1);  // L2: sqrt(14)
const linf_norm = torch.nn.functional.normalize(x, Number.POSITIVE_INFINITY, 1);  // L∞: max=3
// Different normalizations, each with unit norm

// Regularization: constrain network weights
const weights = torch.randn(out_features, in_features);
const normalized_w = torch.nn.functional.normalize(weights, 2, 1);
// Each output feature's weights have unit norm (weight regularization)

// Normalize along different dimensions
const x = torch.randn(batch, channels, height, width);  // 4D image data
const norm_spatial = torch.nn.functional.normalize(x, 2, 1);  // Normalize per channel
const norm_channel = torch.nn.functional.normalize(x, 2, [1, 2, 3]);  // Normalize spatial
// Different semantic meanings depending on dimension

torch.nn.functional.normalize

Parameters

Returns

Examples

See Also

torch.nn.functional.normalize

Parameters

Returns

Examples

See Also