torch.nn.LazyLinear

class LazyLinear extends Module

new LazyLinear(out_features: number, options?: LinearOptions)

weight(Parameter | UninitializedParameter)
bias(Parameter | UninitializedParameter | null)
readonlyout_features(number)
readonlyuse_bias(boolean)
readonlyin_features(number)

Lazy fully connected layer: defers weight initialization until first forward pass.

Computes y = xW^T + b like Linear, but avoids specifying in_features at creation time. Automatically infers in_features from the input's last dimension on first forward pass, then initializes weights with Kaiming uniform. Essential for:

Models with dynamically determined input shapes
Generic architectures where input size varies
Sequential models built dynamically
Avoiding manual calculation of intermediate dimensions
APIs where input shape is only known at runtime

Unlike Linear which requires knowing in_features upfront, LazyLinear defers this until the first forward pass. After materialization, it behaves identically to Linear with the same performance and learned parameters.

When to use LazyLinear:

Input size is unknown at module creation (only known after previous layer)
Building sequential models programmatically with dynamic architecture
Simplifying model definition code (skip manual dimension calculations)
Generic architectures that work with variable input dimensions
Prototyping when you don't want to track tensor shapes manually

Trade-offs:

vs Linear: LazyLinear doesn't need in_features; Linear requires it upfront
Initialization: Same Kaiming uniform initialization as Linear after materialization
First forward: Slightly slower due to materialization on first call
Parameters: Same parameters as Linear once initialized
Code clarity: May be less clear than explicit Linear dimensions in some cases
Debugging: Uninitialized state can be confusing without checking has_uninitialized_params()

Lazy Initialization Process:

Create LazyLinear(out_features) without specifying in_features
On first forward(input), extract in_features from input.shape[-1]
Initialize weight as [out_features, in_features] with Kaiming uniform
Initialize bias as [out_features] with zeros (if bias=true)
Subsequent forwards use materialized parameters like regular Linear

\begin{aligned} y = xW^T + b \\ \text{Weight init (Kaiming uniform): } W \sim \mathcal{U}(-\sqrt{k}, \sqrt{k}) \text{ where } k = \frac{1}{\text{in\_features}} \end{aligned}

Lazy initialization: Parameters are None until first forward pass
Materialization: In_features automatically determined from input's last dimension
Same initialization: Uses Kaiming uniform like Linear after materialization
No overhead: After first forward, performance identical to Linear
Serialization: Must handle uninitialized state when saving/loading models
Type checking: Since in_features is inferred, some type safety is deferred to runtime
Batch dimension: Input can have any batch dimensions; only last dim used for in_features
1D inputs: Supports both [features] and [batch, features] shapes

Uninitialized parameters: Before first forward, weight and bias are uninitialized
First forward slower: Materialization adds overhead to first forward call
Serialization issues: Saving module before first forward may cause issues
Debugging difficulty: Uninitialized state can cause unexpected behavior if not checked
Dimension mismatch: If input's last dimension changes between forwards, it causes errors

Examples

// Simple lazy initialization - input size unknown
const lazy_layer = new torch.nn.LazyLinear(128);
console.log(lazy_layer.in_features);  // 0 (not yet initialized)

const x = torch.randn([32, 256]);     // 256 input features
const output = lazy_layer.forward(x); // [32, 128]
console.log(lazy_layer.in_features);  // 256 (now materialized)

// Sequential model with unknown intermediate dimensions
class SimpleNet extends torch.nn.Module {
  conv: torch.nn.Conv2d;
  flatten: () => void;
  fc: torch.nn.LazyLinear;  // Don't know flatten output size!

  constructor() {
    super();
    this.conv = new torch.nn.Conv2d(3, 64, 5);      // RGB -> 64 channels
    this.fc = new torch.nn.LazyLinear(10);           // Output: 10 classes, input unknown
  }

  forward(x: torch.Tensor): torch.Tensor {
    x = this.conv.forward(x);                        // [B, 64, ...]
    x = x.reshape([x.shape[0], -1]);                 // Flatten to [B, 64*...]
    x = this.fc.forward(x);                          // [B, 10] (in_features auto-inferred)
    return x;
  }
}

// Usage - no manual dimension tracking needed!
const model = new SimpleNet();
const input = torch.randn([4, 3, 32, 32]);           // 4 RGB images, 32x32
const output = model.forward(input);                 // [4, 10]

// Dynamic architecture: number of layers determined at runtime
class DynamicMLP extends torch.nn.Module {
  layers: torch.nn.LazyLinear[];

  constructor(num_layers: number, output_size: number) {
    super();
    this.layers = [];
    for (let i = 0; i < num_layers; i++) {
      const layer = new torch.nn.LazyLinear(256);  // Hidden size: 256
      this.layers.push(layer);
      this.register_module(`layer_${i}`, layer);
    }
    this.layers.push(new torch.nn.LazyLinear(output_size));  // Output layer
  }

  forward(x: torch.Tensor): torch.Tensor {
    for (const layer of this.layers.slice(0, -1)) {
      x = layer.forward(x);
      x = torch.relu(x);
    }
    x = this.layers[this.layers.length - 1].forward(x);  // Final layer
    return x;
  }
}

// Create 5-layer MLP without knowing input size
const model = new DynamicMLP(5, 10);
const x = torch.randn([32, unknown_input_dim]);  // Input size determined at runtime
const output = model.forward(x);

// Checking initialization state
const lazy = new torch.nn.LazyLinear(64);

if (lazy.has_uninitialized_params()) {
  console.log('Parameters not initialized yet');
}

const x = torch.randn([8, 32]);
lazy.forward(x);  // First forward - materializes parameters

if (!lazy.has_uninitialized_params()) {
  console.log(`Parameters initialized: in_features=${lazy.in_features}`);
  // Now lazy behaves like Linear(32, 64)
}

// Lazy layer in feature extraction pipeline
class FeatureExtractor extends torch.nn.Module {
  feature_layers: torch.nn.Module[];
  classifier: torch.nn.LazyLinear;

  constructor() {
    super();
    // Build feature extraction layers
    this.feature_layers = [
      new torch.nn.Conv2d(3, 32, 3),
      new torch.nn.ReLU(),
      new torch.nn.Conv2d(32, 64, 3),
      new torch.nn.ReLU(),
    ];

    // Classifier: don't know flatten size until first forward
    this.classifier = new torch.nn.LazyLinear(10);
  }

  forward(x: torch.Tensor): torch.Tensor {
    for (const layer of this.feature_layers) {
      if (layer instanceof torch.nn.Conv2d || layer instanceof torch.nn.ReLU) {
        x = layer.forward(x);
      }
    }
    x = x.reshape([x.shape[0], -1]);  // Flatten
    return this.classifier.forward(x);
  }
}

torch.nn.LazyLinear

Examples

See Also

torch.nn.LazyLinear

Examples

See Also