torch.nn.init.orthogonal_

function orthogonal_(tensor: Tensor, options?: OrthogonalOptions): Tensorfunction orthogonal_(tensor: Tensor, gain: number, options?: OrthogonalOptions): Tensor

Fill tensor with orthogonal matrix for gradient flow and convergence.

Orthogonal initialization creates weight matrices with orthogonal rows (or columns). Useful for initializing networks to preserve signal magnitude through layers and enable fast convergence. Essential for:

RNNs and LSTMs (orthogonal weight matrices preserve gradient magnitudes)
Deep networks sensitive to vanishing/exploding gradients
Networks where orthogonality helps training dynamics
Sequence models where signal preservation through time is important
Theoretical analysis of gradient flow

Provides excellent gradient flow properties: ∥Wx∥ ≈ ∥x∥ for orthogonal W.

Described in "Exact solutions to the nonlinear dynamics of learning in deep linear neural networks" - Saxe, A. et al. (2013).

W_{\text{orth}} = \text{Orthogonal}([\text{rows}, \text{cols}]), \quad W_{\text{final}} = \text{gain} \times W_{\text{orth}}

Gradient flow: Orthogonal matrices preserve vector norms through multiplication
RNNs: Especially beneficial for RNNs/LSTMs to prevent vanishing gradients
Flattening: For multidimensional tensors (conv), rows are first dim, rest flattened
Gain scaling: gain = √2 for ReLU networks, gain = 1 for tanh/sigmoid
Approximation: Implementation uses simplified approach, not full QR decomposition
Computational cost: More expensive than Xavier/He due to orthogonal computation
In-place operation: Modifies tensor in-place; returns the same tensor

Parameters

tensorTensor: An n-dimensional Tensor where n = 2. For n 2, trailing dimensions are flattened into columns. Shape is reshaped as (n_rows, n_cols) where n_cols = product of remaining dims
optionsOrthogonalOptionsoptional: Optional settings for orthogonal initialization

Returns

Tensor– The input tensor with orthogonal initialization Algorithm: - Generate random matrix with entries from N(0, 1) - Flatten tensor to 2D: (rows, cols) - Compute (approximate) orthogonal matrix - Scale by gain factor - Reshape back to original shape

Examples

// RNN weight initialization
const rnn = torch.nn.RNN(input_size = 64, hidden_size = 128, num_layers = 2);
for (const [name, param] of rnn.named_parameters()) {
  if ('weight_hh' in name || name.includes('recurrent')) {
    // Use orthogonal for recurrent weights
    torch.nn.init.orthogonal_(param);
  } else if ('weight_ih' in name || name.includes('input')) {
    // Xavier for input-to-hidden
    const gain = torch.nn.init.calculate_gain('relu');
    torch.nn.init.xavier_uniform_(param, { gain });
  }
}

// LSTM with orthogonal initialization
const lstm = torch.nn.LSTM(input_size = 128, hidden_size = 256, batch_first = true);
for (const [name, param] of lstm.named_parameters()) {
  if ('weight_hh' in name) {
    // Orthogonal for hidden-to-hidden (recurrent)
    torch.nn.init.orthogonal_(param, { gain: 1.0 });
  } else if ('weight_ih' in name) {
    // Xavier for input-to-hidden
    torch.nn.init.xavier_uniform_(param);
  }
}

// Deep fully-connected network with orthogonal init
const layer1 = torch.nn.Linear(512, 512);
const layer2 = torch.nn.Linear(512, 512);
const layer3 = torch.nn.Linear(512, 10);

// Orthogonal init for hidden layers
torch.nn.init.orthogonal_(layer1.weight, { gain: torch.nn.init.calculate_gain('relu') });
torch.nn.init.orthogonal_(layer2.weight, { gain: torch.nn.init.calculate_gain('relu') });
// Xavier for output layer
torch.nn.init.xavier_uniform_(layer3.weight);

torch.nn.init.zeros_(layer1.bias);
torch.nn.init.zeros_(layer2.bias);
torch.nn.init.zeros_(layer3.bias);

// Convolutional layer with orthogonal initialization
const conv = torch.nn.Conv2d(3, 64, { kernel_size: 3, padding: 1 });
torch.nn.init.orthogonal_(conv.weight, { gain: 1.0 });
torch.nn.init.zeros_(conv.bias);

torch.nn.init.orthogonal_

Parameters

Returns

Examples

See Also

torch.nn.init.orthogonal_

Parameters

Returns

Examples

See Also