torch.nn.init.kaiming_normal_
function kaiming_normal_(tensor: Tensor, options?: KaimingOptions): Tensorfunction kaiming_normal_(tensor: Tensor, a: number, mode: FanMode, nonlinearity: Nonlinearity, options?: KaimingOptions): TensorFill tensor with Kaiming (He) normal initialization for ReLU-based networks.
Normal distribution variant of Kaiming initialization. Samples from N(0, std²) where std scales based on fan and activation function. Equivalent to kaiming_uniform_ but with Gaussian distribution instead of uniform. Essential for:
- Deep ReLU networks preferring normal distribution
- Networks trained with batch normalization (works well together)
- Theoretical analysis of initialization scales
- When normal distribution is explicitly required or preferred
Also called He initialization (normal variant).
The method is described in "Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification" - He, K. et al. (2015).
- Normal vs Uniform: Kaiming normal and kaiming_uniform_ have same variance, different distribution
- Batch Normalization: Kaiming normal works especially well with batch norm
- Distribution shape: Normal distribution has heavier tails than uniform
- Leaky ReLU slope: Must match the α used during forward pass
- In-place operation: Modifies tensor in-place; returns the same tensor
- Comparison to uniform: Both equally valid; choose based on downstream assumptions
Parameters
tensorTensor- An n-dimensional Tensor (typically weight matrix from a layer)
optionsKaimingOptionsoptional- Optional settings for Kaiming initialization
Returns
Tensor– The input tensor with Kaiming normal initialization Algorithm: - Values sampled from normal distribution N(0, std²) - std = gain / √fan - gain = √(2 / (1 + α²)) for leaky_relu with slope α - gain = √2 for relu - fan = fan_in or fan_out (chosen by mode parameter)Examples
// Basic He initialization with normal distribution
const layer = torch.nn.Linear(512, 256);
torch.nn.init.kaiming_normal_(layer.weight, { a: 0, mode: 'fan_in', nonlinearity: 'relu' });
torch.nn.init.zeros_(layer.bias);// With batch normalization (common combination)
const conv = torch.nn.Conv2d(3, 64, 3, { padding: 1 });
const bn = torch.nn.BatchNorm2d(64);
torch.nn.init.kaiming_normal_(conv.weight, { a: 0, mode: 'fan_out', nonlinearity: 'relu' });
torch.nn.init.zeros_(conv.bias);
torch.nn.init.ones_(bn.weight);
torch.nn.init.zeros_(bn.bias);// Leaky ReLU with custom negative slope
const layer = torch.nn.Linear(1024, 512);
const alpha = 0.2;
torch.nn.init.kaiming_normal_(layer.weight, { a: alpha, mode: 'fan_in', nonlinearity: 'leaky_relu' });
torch.nn.init.zeros_(layer.bias);See Also
- PyTorch torch.nn.init.kaiming_normal_()
- torch.nn.init.kaiming_uniform_ - Kaiming with uniform distribution
- torch.nn.init.xavier_normal_ - Xavier initialization with normal distribution
- torch.nn.init.calculate_gain - Get gain for specific activation function