torch.nn.LazyLinear
class LazyLinear extends Modulenew LazyLinear(out_features: number, options?: LinearOptions)
weight(Parameter | UninitializedParameter)bias(Parameter | UninitializedParameter | null)- readonly
out_features(number) - readonly
use_bias(boolean) - readonly
in_features(number)
Lazy fully connected layer: defers weight initialization until first forward pass.
Computes y = xW^T + b like Linear, but avoids specifying in_features at creation time. Automatically infers in_features from the input's last dimension on first forward pass, then initializes weights with Kaiming uniform. Essential for:
- Models with dynamically determined input shapes
- Generic architectures where input size varies
- Sequential models built dynamically
- Avoiding manual calculation of intermediate dimensions
- APIs where input shape is only known at runtime
Unlike Linear which requires knowing in_features upfront, LazyLinear defers this until the first forward pass. After materialization, it behaves identically to Linear with the same performance and learned parameters.
When to use LazyLinear:
- Input size is unknown at module creation (only known after previous layer)
- Building sequential models programmatically with dynamic architecture
- Simplifying model definition code (skip manual dimension calculations)
- Generic architectures that work with variable input dimensions
- Prototyping when you don't want to track tensor shapes manually
Trade-offs:
- vs Linear: LazyLinear doesn't need in_features; Linear requires it upfront
- Initialization: Same Kaiming uniform initialization as Linear after materialization
- First forward: Slightly slower due to materialization on first call
- Parameters: Same parameters as Linear once initialized
- Code clarity: May be less clear than explicit Linear dimensions in some cases
- Debugging: Uninitialized state can be confusing without checking
has_uninitialized_params()
Lazy Initialization Process:
- Create LazyLinear(out_features) without specifying in_features
- On first forward(input), extract in_features from input.shape[-1]
- Initialize weight as [out_features, in_features] with Kaiming uniform
- Initialize bias as [out_features] with zeros (if bias=true)
- Subsequent forwards use materialized parameters like regular Linear
- Lazy initialization: Parameters are None until first forward pass
- Materialization: In_features automatically determined from input's last dimension
- Same initialization: Uses Kaiming uniform like Linear after materialization
- No overhead: After first forward, performance identical to Linear
- Serialization: Must handle uninitialized state when saving/loading models
- Type checking: Since in_features is inferred, some type safety is deferred to runtime
- Batch dimension: Input can have any batch dimensions; only last dim used for in_features
- 1D inputs: Supports both [features] and [batch, features] shapes
- Uninitialized parameters: Before first forward, weight and bias are uninitialized
- First forward slower: Materialization adds overhead to first forward call
- Serialization issues: Saving module before first forward may cause issues
- Debugging difficulty: Uninitialized state can cause unexpected behavior if not checked
- Dimension mismatch: If input's last dimension changes between forwards, it causes errors
Examples
// Simple lazy initialization - input size unknown
const lazy_layer = new torch.nn.LazyLinear(128);
console.log(lazy_layer.in_features); // 0 (not yet initialized)
const x = torch.randn([32, 256]); // 256 input features
const output = lazy_layer.forward(x); // [32, 128]
console.log(lazy_layer.in_features); // 256 (now materialized)// Sequential model with unknown intermediate dimensions
class SimpleNet extends torch.nn.Module {
conv: torch.nn.Conv2d;
flatten: () => void;
fc: torch.nn.LazyLinear; // Don't know flatten output size!
constructor() {
super();
this.conv = new torch.nn.Conv2d(3, 64, 5); // RGB -> 64 channels
this.fc = new torch.nn.LazyLinear(10); // Output: 10 classes, input unknown
}
forward(x: torch.Tensor): torch.Tensor {
x = this.conv.forward(x); // [B, 64, ...]
x = x.reshape([x.shape[0], -1]); // Flatten to [B, 64*...]
x = this.fc.forward(x); // [B, 10] (in_features auto-inferred)
return x;
}
}
// Usage - no manual dimension tracking needed!
const model = new SimpleNet();
const input = torch.randn([4, 3, 32, 32]); // 4 RGB images, 32x32
const output = model.forward(input); // [4, 10]// Dynamic architecture: number of layers determined at runtime
class DynamicMLP extends torch.nn.Module {
layers: torch.nn.LazyLinear[];
constructor(num_layers: number, output_size: number) {
super();
this.layers = [];
for (let i = 0; i < num_layers; i++) {
const layer = new torch.nn.LazyLinear(256); // Hidden size: 256
this.layers.push(layer);
this.register_module(`layer_${i}`, layer);
}
this.layers.push(new torch.nn.LazyLinear(output_size)); // Output layer
}
forward(x: torch.Tensor): torch.Tensor {
for (const layer of this.layers.slice(0, -1)) {
x = layer.forward(x);
x = torch.relu(x);
}
x = this.layers[this.layers.length - 1].forward(x); // Final layer
return x;
}
}
// Create 5-layer MLP without knowing input size
const model = new DynamicMLP(5, 10);
const x = torch.randn([32, unknown_input_dim]); // Input size determined at runtime
const output = model.forward(x);// Checking initialization state
const lazy = new torch.nn.LazyLinear(64);
if (lazy.has_uninitialized_params()) {
console.log('Parameters not initialized yet');
}
const x = torch.randn([8, 32]);
lazy.forward(x); // First forward - materializes parameters
if (!lazy.has_uninitialized_params()) {
console.log(`Parameters initialized: in_features=${lazy.in_features}`);
// Now lazy behaves like Linear(32, 64)
}// Lazy layer in feature extraction pipeline
class FeatureExtractor extends torch.nn.Module {
feature_layers: torch.nn.Module[];
classifier: torch.nn.LazyLinear;
constructor() {
super();
// Build feature extraction layers
this.feature_layers = [
new torch.nn.Conv2d(3, 32, 3),
new torch.nn.ReLU(),
new torch.nn.Conv2d(32, 64, 3),
new torch.nn.ReLU(),
];
// Classifier: don't know flatten size until first forward
this.classifier = new torch.nn.LazyLinear(10);
}
forward(x: torch.Tensor): torch.Tensor {
for (const layer of this.feature_layers) {
if (layer instanceof torch.nn.Conv2d || layer instanceof torch.nn.ReLU) {
x = layer.forward(x);
}
}
x = x.reshape([x.shape[0], -1]); // Flatten
return this.classifier.forward(x);
}
}