torch.nn.Module.load_state_dict

Module.load_state_dict(state_dict: Record<string, Tensor>): void

Loads parameters and buffers from a state_dict into this module and its descendants.

Copies all tensors from the state_dict into the corresponding parameters and buffers of this module. Shape and structure must match exactly. This is the inverse of state_dict(), typically used after loading a saved checkpoint. Handles both parameters and buffers, and applies to all submodules recursively. Essential for:

Model loading: restore saved training checkpoints
Transfer learning: load pretrained weights into new model
Ensemble inference: load multiple models with same architecture
Model updates: reset to best checkpoint after overfitting
Production deployment: initialize model with trained weights

Validation: Checks that shape and size match the current model structure. Mismatches raise errors, preventing silent loading of incompatible weights. This strict validation ensures the loaded weights actually match the model architecture.

Partial Loading: If state_dict doesn't include all parameters (e.g., only some layers), only those parameters are updated. Others retain their current values. This enables loading partial pretrained models.

Data Copying: The operation modifies parameter values in-place. Gradients are not copied (they're typically recomputed during training). Only the underlying tensor data is transferred.

Typical Usage:

// Save during training
const best_loss = Infinity;
for (const batch of train_loader) {
  // ... training ...
  if (current_loss < best_loss) {
    best_loss = current_loss;
    fs.writeFileSync('best_model.json', JSON.stringify(model.state_dict()));
  }
}

// Load best model later
const best_state = JSON.parse(fs.readFileSync('best_model.json'));
model.load_state_dict(best_state);

In-place modification: Modifies parameter values directly
Recursive: Automatically loads into all submodules
Strict shape checking: Raises error if shapes don't match
No gradient copying: Only tensor data, not gradients
Inverse of state_dict: state_dict() saves, load_state_dict() restores

Architecture must match: Model structure must match state_dict keys
Shape mismatch error: Different architecture causes hard error
Overwrites current weights: Current parameter values are replaced
Gradients not loaded: Must retrain to get correct gradients
Version compatibility: old checkpoints may not load into new code

Parameters

state_dictRecord<string, Tensor>: Dictionary of parameters and buffers to load. Keys must match parameter/buffer names in the module. Values must be Tensor objects.

Returns

void (modifies module in-place)

Examples

// Simple save and load
const model = new MyModel();

// ... train model ...

// Save
const state = model.state_dict();

// Create new model and load
const new_model = new MyModel();
new_model.load_state_dict(state);
// new_model now has same weights as trained model

// Load pretrained model
const pretrained_state = load_pretrained_weights();

const model = new MyModel();
model.load_state_dict(pretrained_state);
// Model now has pretrained weights, ready for fine-tuning

// Checkpoint during training
let best_val_loss = Infinity;

for (const epoch of range(num_epochs)) {
  // Training...
  const val_loss = evaluate(model, val_loader);

  if (val_loss < best_val_loss) {
    best_val_loss = val_loss;
    // Save best model
    const best_state = model.state_dict();
    save_checkpoint({
      epoch: epoch,
      model_state: best_state,
      val_loss: val_loss
    });
  }
}

// Later: load best model
const checkpoint = load_checkpoint();
model.load_state_dict(checkpoint.model_state);

// Partial loading: load some layers but not others
const pretrained_state = load_pretrained_weights();
// pretrained_state might only have encoder weights, not decoder

const model = new FullModel();
// This works fine - loads encoder, leaves decoder unchanged
model.load_state_dict(pretrained_state);

torch.nn.Module.load_state_dict

Parameters

Returns

Examples

See Also

torch.nn.Module.load_state_dict

Parameters

Returns

Examples

See Also