torch.nn.LocalResponseNorm

class LocalResponseNorm extends Module

new LocalResponseNorm(size: number, options?: LocalResponseNormOptions)

readonlysize(number)
readonlyalpha(number)
readonlybeta(number)
readonlyk(number)

Local Response Normalization: normalizes values based on locally surrounding channels.

Normalizes each value based on the activations of nearby channels within a spatial window. Legacy normalization from AlexNet era, largely superseded by BatchNorm and GroupNorm. Still useful for:

Lateral inhibition (competing neurons suppress neighbors)
Biological plausibility (mimics lateral inhibition in visual cortex)
Brightness normalization in vision tasks
Image processing pipelines requiring local contrast normalization
Historical/reproducibility reasons (matching AlexNet results)

Why Local Response Normalization Fell Out of Favor: Batch normalization achieves better results with fewer hyperparameters. Modern architectures use BatchNorm, GroupNorm, or LayerNorm instead. LocalResponseNorm is primarily for historical compatibility or specific applications.

When to use LocalResponseNorm:

Matching historical AlexNet/VGG results
Computer vision tasks requiring local contrast normalization
Biological neural network simulations
Specific image processing pipelines
Rarely: when other norms fail and you need local response properties
Not recommended for new models: use BatchNorm, GroupNorm, or LayerNorm instead

Trade-offs:

vs BatchNorm: Local statistics (not batch); works on single samples
vs GroupNorm: Different statistics computation (channel-wise window vs groups)
Spatial window: Requires specifying neighborhood size (tuning needed)
Computational cost: Higher than simple normalization (sliding window computation)
Biological interpretation: Lateral inhibition between competing channels
Modern alternatives: Usually BatchNorm/GroupNorm work better

Algorithm: For each position (batch, channel, spatial_position):

Define local window: [channel - size//2, ..., channel + size//2]
Compute sum of squares of activations in window: Σ (x[c]²) for c in window
Normalize: output = x / (k + alpha * sum_of_squares / size) ^ beta

Where k, alpha, beta are hyperparameters controlling response strength. The denominator grows as nearby channels have larger activations (lateral inhibition effect).

\begin{aligned} \text{window\_sum} = \sum_{i=c-\lfloor s/2 \rfloor}^{c+\lfloor s/2 \rfloor} x[b, i, \ldots]^2 \text{ where } s \text{ = size} \\ \text{denominator} = \left(k + \alpha \cdot \frac{\text{window\_sum}}{s}\right)^{\beta} \\ y = \frac{x}{\text{denominator}} = \frac{x}{\left(k + \alpha \cdot \frac{\sum_i x[b,i,\ldots]^2}{s}\right)^{\beta}} \end{aligned}

Historical: Primarily for AlexNet/VGG reproducibility
Lateral inhibition: Implements competing channels suppressing each other
Channel window: Applies across channels, not spatial dimensions
Biological motivation: Inspired by lateral inhibition in visual cortex
Window size: Includes both sides of center channel; full window = 2*(size//2) + 1
Modern alternatives: BatchNorm, GroupNorm, LayerNorm usually work better
Computational efficiency: Sliding window computation can be optimized
No learnable parameters: Fixed computation (unlike BatchNorm, GroupNorm)
Train/eval mode: Behaves identically in train() and eval() (no running statistics)
Per-sample: Statistics computed per-sample, not batch statistics

Not recommended for new models - use BatchNorm, GroupNorm, or LayerNorm instead
Size parameter must be positive integer
Window may wrap around channel boundaries (implementation-dependent)
Large size values increase computational cost
Hyperparameters (alpha, beta, k) require tuning for different tasks

Examples

// Legacy AlexNet-style local response normalization
const lrn = new torch.nn.LocalResponseNorm(5);  // Window size 5

const x = torch.randn([32, 64, 28, 28]);  // [batch=32, channels=64, spatial 28x28]
const normalized = lrn.forward(x);  // Same shape
// Each channel's activation suppressed by activations of nearby channels

// AlexNet-like configuration with explicit hyperparameters
const lrn = new torch.nn.LocalResponseNorm(
  5,      // size: window of 5 channels
  0.0001, // alpha
  0.75,   // beta
  1.0     // k
);

// This matches the original AlexNet paper configuration
const feature_maps = torch.randn([batch_size, 96, 55, 55]);
const output = lrn.forward(feature_maps);

// Custom strength tuning for different datasets

// Weak normalization (minimal suppression)
const weak_lrn = new torch.nn.LocalResponseNorm(3, 0.00001, 0.5);

// Strong normalization (aggressive suppression)
const strong_lrn = new torch.nn.LocalResponseNorm(7, 0.001, 1.0);

const x = torch.randn([16, 128, 32, 32]);
const weak_norm = weak_lrn.forward(x);      // Less suppression
const strong_norm = strong_lrn.forward(x);  // More suppression

torch.nn.LocalResponseNorm

Examples

See Also

torch.nn.LocalResponseNorm

Examples

See Also