torch.autograd.inference_mode

function inference_mode<T>(fn: () => T): T

Context manager for inference mode.

Disables gradients AND prevents resulting tensors from being used with autograd later. This is stricter and more optimized than no_grad() for pure inference workloads where you're absolutely certain the outputs won't be used for training. Inference mode both disables gradient tracking and marks tensors as incompatible with autograd, enabling more aggressive memory and computation optimizations. Essential for:

Pure inference: model serving where outputs definitely won't be trained
Maximum performance: inference_mode is fastest since it can skip autograd setup
Memory efficiency: stricter mode allows more aggressive optimizations
Correctness assertion: enforces that inference branch won't be used for training
Model serving: production endpoints where training is impossible
Batch inference: processing large datasets without any training intent

Key Difference from no_grad():

no_grad: disables gradients but allows later autograd (tensors can be used for training)
inference_mode: disables gradients AND prevents later autograd (safer for pure inference)

Use inference_mode when you're certain your code path won't be used for training. Use no_grad when you want to disable gradients temporarily but might use tensors later.

Performance: Inference mode is slightly faster than no_grad because it can skip certain autograd setup that might be needed for later gradient computation.

Cannot use enable_grad inside: Unlike no_grad which allows enable_grad nesting, inference_mode is terminal - you cannot re-enable gradients inside it because tensors created would be incompatible with autograd.

Terminal mode: Cannot nest enable_grad inside inference_mode
Stricter than no_grad: Prevents any autograd use, not just gradient tracking
Auto-restore: Saves and restores inference state even if function throws
Performance: Slightly faster than no_grad due to additional optimizations
Memory: Can use more aggressive memory optimizations due to stricter constraints
Semantics: Declares "this code path is inference only, never training"

No enable_grad nesting: Cannot re-enable gradients inside inference_mode
Incompatible with training: Tensors created are not usable for backward()
Prevents accidental training: Use for code paths that should never train
Different from no_grad: Stricter constraints than just disabling gradients

Parameters

fn() => T: Function to execute in inference mode. Can be sync or async. All tensors created inside are marked as inference-only and can't be used for training.

Returns

T– The result of the function

Examples

// Pure inference without any gradient possibility
const prediction = torch.inference_mode(() => {
  return model.forward(input);
});
// prediction cannot be used for backward()

// Model serving: inference endpoint handler
async function handle_prediction_request(request) {
  return torch.inference_mode(async () => {
    const input = preprocess(request.data);
    const output = model.forward(input);
    return postprocess(output);
  });
}
// Output is guaranteed to not be trainable - safe for production

// Batch inference on dataset
function evaluate_on_dataset(model, dataset) {
  const predictions = [];
  torch.inference_mode(() => {
    for (const batch of dataset) {
      const output = model.forward(batch);
      predictions.push(output);
    }
  });
  return predictions;
}

// Comparison: no_grad vs inference_mode
const input = torch.randn(1, 3);

// Using no_grad: tensors can be used for training later
const pred1 = torch.no_grad(() => {
  return model.forward(input);
});
// Can still do: pred1.backward() (though unusual)

// Using inference_mode: tensors cannot be used for training
const pred2 = torch.inference_mode(() => {
  return model.forward(input);
});
// Cannot do: pred2.backward() - incompatible with autograd

// Correctness check: ensure training branch is separate
function forward(model, batch, training = false) {
  if (training) {
    // Training branch - gradients enabled
    return model.forward(batch);
  } else {
    // Inference branch - cannot be accidentally used for training
    return torch.inference_mode(() => {
      return model.forward(batch);
    });
  }
}

torch.autograd.inference_mode

Parameters

Returns

Examples

See Also

torch.autograd.inference_mode

Parameters

Returns

Examples

See Also