torch.nn.LocalResponseNorm
class LocalResponseNorm extends Modulenew LocalResponseNorm(size: number, options?: LocalResponseNormOptions)
- readonly
size(number) - readonly
alpha(number) - readonly
beta(number) - readonly
k(number)
Local Response Normalization: normalizes values based on locally surrounding channels.
Normalizes each value based on the activations of nearby channels within a spatial window. Legacy normalization from AlexNet era, largely superseded by BatchNorm and GroupNorm. Still useful for:
- Lateral inhibition (competing neurons suppress neighbors)
- Biological plausibility (mimics lateral inhibition in visual cortex)
- Brightness normalization in vision tasks
- Image processing pipelines requiring local contrast normalization
- Historical/reproducibility reasons (matching AlexNet results)
Why Local Response Normalization Fell Out of Favor: Batch normalization achieves better results with fewer hyperparameters. Modern architectures use BatchNorm, GroupNorm, or LayerNorm instead. LocalResponseNorm is primarily for historical compatibility or specific applications.
When to use LocalResponseNorm:
- Matching historical AlexNet/VGG results
- Computer vision tasks requiring local contrast normalization
- Biological neural network simulations
- Specific image processing pipelines
- Rarely: when other norms fail and you need local response properties
- Not recommended for new models: use BatchNorm, GroupNorm, or LayerNorm instead
Trade-offs:
- vs BatchNorm: Local statistics (not batch); works on single samples
- vs GroupNorm: Different statistics computation (channel-wise window vs groups)
- Spatial window: Requires specifying neighborhood size (tuning needed)
- Computational cost: Higher than simple normalization (sliding window computation)
- Biological interpretation: Lateral inhibition between competing channels
- Modern alternatives: Usually BatchNorm/GroupNorm work better
Algorithm: For each position (batch, channel, spatial_position):
- Define local window: [channel - size//2, ..., channel + size//2]
- Compute sum of squares of activations in window: Σ (x[c]²) for c in window
- Normalize: output = x / (k + alpha * sum_of_squares / size) ^ beta
Where k, alpha, beta are hyperparameters controlling response strength. The denominator grows as nearby channels have larger activations (lateral inhibition effect).
- Historical: Primarily for AlexNet/VGG reproducibility
- Lateral inhibition: Implements competing channels suppressing each other
- Channel window: Applies across channels, not spatial dimensions
- Biological motivation: Inspired by lateral inhibition in visual cortex
- Window size: Includes both sides of center channel; full window = 2*(size//2) + 1
- Modern alternatives: BatchNorm, GroupNorm, LayerNorm usually work better
- Computational efficiency: Sliding window computation can be optimized
- No learnable parameters: Fixed computation (unlike BatchNorm, GroupNorm)
- Train/eval mode: Behaves identically in train() and eval() (no running statistics)
- Per-sample: Statistics computed per-sample, not batch statistics
- Not recommended for new models - use BatchNorm, GroupNorm, or LayerNorm instead
- Size parameter must be positive integer
- Window may wrap around channel boundaries (implementation-dependent)
- Large size values increase computational cost
- Hyperparameters (alpha, beta, k) require tuning for different tasks
Examples
// Legacy AlexNet-style local response normalization
const lrn = new torch.nn.LocalResponseNorm(5); // Window size 5
const x = torch.randn([32, 64, 28, 28]); // [batch=32, channels=64, spatial 28x28]
const normalized = lrn.forward(x); // Same shape
// Each channel's activation suppressed by activations of nearby channels// AlexNet-like configuration with explicit hyperparameters
const lrn = new torch.nn.LocalResponseNorm(
5, // size: window of 5 channels
0.0001, // alpha
0.75, // beta
1.0 // k
);
// This matches the original AlexNet paper configuration
const feature_maps = torch.randn([batch_size, 96, 55, 55]);
const output = lrn.forward(feature_maps);// Custom strength tuning for different datasets
// Weak normalization (minimal suppression)
const weak_lrn = new torch.nn.LocalResponseNorm(3, 0.00001, 0.5);
// Strong normalization (aggressive suppression)
const strong_lrn = new torch.nn.LocalResponseNorm(7, 0.001, 1.0);
const x = torch.randn([16, 128, 32, 32]);
const weak_norm = weak_lrn.forward(x); // Less suppression
const strong_norm = strong_lrn.forward(x); // More suppression