torch.autograd.inference_mode
function inference_mode<T>(fn: () => T): TContext manager for inference mode.
Disables gradients AND prevents resulting tensors from being used with autograd later. This is stricter and more optimized than no_grad() for pure inference workloads where you're absolutely certain the outputs won't be used for training. Inference mode both disables gradient tracking and marks tensors as incompatible with autograd, enabling more aggressive memory and computation optimizations. Essential for:
- Pure inference: model serving where outputs definitely won't be trained
- Maximum performance: inference_mode is fastest since it can skip autograd setup
- Memory efficiency: stricter mode allows more aggressive optimizations
- Correctness assertion: enforces that inference branch won't be used for training
- Model serving: production endpoints where training is impossible
- Batch inference: processing large datasets without any training intent
Key Difference from no_grad():
- no_grad: disables gradients but allows later autograd (tensors can be used for training)
- inference_mode: disables gradients AND prevents later autograd (safer for pure inference)
Use inference_mode when you're certain your code path won't be used for training. Use no_grad when you want to disable gradients temporarily but might use tensors later.
Performance: Inference mode is slightly faster than no_grad because it can skip certain autograd setup that might be needed for later gradient computation.
Cannot use enable_grad inside: Unlike no_grad which allows enable_grad nesting, inference_mode is terminal - you cannot re-enable gradients inside it because tensors created would be incompatible with autograd.
- Terminal mode: Cannot nest enable_grad inside inference_mode
- Stricter than no_grad: Prevents any autograd use, not just gradient tracking
- Auto-restore: Saves and restores inference state even if function throws
- Performance: Slightly faster than no_grad due to additional optimizations
- Memory: Can use more aggressive memory optimizations due to stricter constraints
- Semantics: Declares "this code path is inference only, never training"
- No enable_grad nesting: Cannot re-enable gradients inside inference_mode
- Incompatible with training: Tensors created are not usable for backward()
- Prevents accidental training: Use for code paths that should never train
- Different from no_grad: Stricter constraints than just disabling gradients
Parameters
fn() => T- Function to execute in inference mode. Can be sync or async. All tensors created inside are marked as inference-only and can't be used for training.
Returns
T– The result of the functionExamples
// Pure inference without any gradient possibility
const prediction = torch.inference_mode(() => {
return model.forward(input);
});
// prediction cannot be used for backward()// Model serving: inference endpoint handler
async function handle_prediction_request(request) {
return torch.inference_mode(async () => {
const input = preprocess(request.data);
const output = model.forward(input);
return postprocess(output);
});
}
// Output is guaranteed to not be trainable - safe for production// Batch inference on dataset
function evaluate_on_dataset(model, dataset) {
const predictions = [];
torch.inference_mode(() => {
for (const batch of dataset) {
const output = model.forward(batch);
predictions.push(output);
}
});
return predictions;
}// Comparison: no_grad vs inference_mode
const input = torch.randn(1, 3);
// Using no_grad: tensors can be used for training later
const pred1 = torch.no_grad(() => {
return model.forward(input);
});
// Can still do: pred1.backward() (though unusual)
// Using inference_mode: tensors cannot be used for training
const pred2 = torch.inference_mode(() => {
return model.forward(input);
});
// Cannot do: pred2.backward() - incompatible with autograd// Correctness check: ensure training branch is separate
function forward(model, batch, training = false) {
if (training) {
// Training branch - gradients enabled
return model.forward(batch);
} else {
// Inference branch - cannot be accidentally used for training
return torch.inference_mode(() => {
return model.forward(batch);
});
}
}See Also
- PyTorch torch.inference_mode()
- torch.no_grad - Disable gradients but allow later autograd (less strict)
- torch.enable_grad - Re-enable gradients (not usable inside inference_mode)
- torch.is_inference_mode_enabled - Check if in inference mode
- torch.is_grad_enabled - Check if gradients enabled (also false in inference_mode)