torch.nn.init.xavier_uniform_
function xavier_uniform_(tensor: Tensor, options?: XavierOptions): Tensorfunction xavier_uniform_(tensor: Tensor, gain: number, options?: XavierOptions): TensorFill tensor with Xavier (Glorot) uniform initialization for stable deep network training.
Xavier initialization maintains signal magnitude through layers by scaling initialization based on layer size (fan_in and fan_out). Maintains activation variance in both forward and backward passes. Essential for:
- Initializing deep networks (2+ layers) for stable training
- Networks with sigmoid/tanh activations (especially important)
- Preventing vanishing/exploding gradients during training
- Ensuring similar scale of activations across layers
- Standard choice for fully connected and convolutional layers
Named after Xavier Glorot. Also called Glorot initialization.
The method is described in "Understanding the difficulty of training deep feedforward neural networks" - Glorot, X. & Bengio, Y. (2010).
- Uniform vs Normal: Both (xavier_uniform_ and xavier_normal_) have same variance, different distribution
- Activation-aware: Always pair with calculate_gain() for your activation function
- Layer dependency: Scales weights based on fan_in and fan_out, automatically adapts to layer size
- Sigmoid/Tanh: Especially important for sigmoid/tanh - helps avoid saturation
- ReLU networks: ReLU networks often benefit from He initialization instead
- Comparison to He: Xavier assumes linear activation; He scales for ReLU's dying problem
- In-place operation: Modifies tensor in-place; returns the same tensor
Parameters
tensorTensor- An n-dimensional Tensor (typically weight matrix from a layer)
optionsXavierOptionsoptional- Optional settings for Xavier initialization
Returns
Tensor– The input tensor with Xavier uniform initialization Algorithm: - Values sampled from uniform distribution U(-a, a) - a = gain × √(6 / (fan_in + fan_out)) - fan_in = input size; fan_out = output size - For conv layers: fan includes kernel sizeExamples
// Basic Xavier initialization for linear layer
const layer = torch.nn.Linear(512, 256);
torch.nn.init.xavier_uniform_(layer.weight); // Use default gain=1
torch.nn.init.zeros_(layer.bias);
const x = torch.randn([32, 512]);
const y = layer.forward(x); // Activations should be nicely scaled// Xavier with activation-specific gain
const layer = torch.nn.Linear(1024, 512);
const gain = torch.nn.init.calculate_gain('relu'); // Get ReLU-specific gain
torch.nn.init.xavier_uniform_(layer.weight, { gain });
torch.nn.init.zeros_(layer.bias);// Convolutional layer initialization
const conv = torch.nn.Conv2d(3, 64, { kernel_size: 7, stride: 2, padding: 3 });
torch.nn.init.xavier_uniform_(conv.weight); // Works for conv too
torch.nn.init.zeros_(conv.bias);// Deep network initialization
class DeepMLPwithXavier extends torch.nn.Module {
fc1: torch.nn.Linear;
fc2: torch.nn.Linear;
fc3: torch.nn.Linear;
constructor() {
super();
this.fc1 = new torch.nn.Linear(784, 512);
this.fc2 = new torch.nn.Linear(512, 256);
this.fc3 = new torch.nn.Linear(256, 10);
// Initialize all layers with Xavier
for (const [name, param] of this.named_parameters()) {
if (name.includes('weight')) {
torch.nn.init.xavier_uniform_(param);
} else if (name.includes('bias')) {
torch.nn.init.zeros_(param);
}
}
}
}See Also
- PyTorch torch.nn.init.xavier_uniform_()
- torch.nn.init.xavier_normal_ - Xavier with normal distribution
- torch.nn.init.kaiming_uniform_ - He initialization (better for ReLU)
- torch.nn.init.kaiming_normal_ - He initialization with normal distribution
- torch.nn.init.calculate_gain - Get gain for specific activation function