torch.autograd.profile
function profile(options: ProfilerOptions = {}): ProfilerContextCreates a profiler context for measuring neural network performance.
Enables profiling of tensor operations to identify performance bottlenecks. Records timing information for forward/backward passes and tracks kernel execution. Useful for:
- Performance debugging: Finding slow layers and operations
- Training optimization: Identifying which operations consume most time
- Memory analysis: Tracking memory usage by operation
- Profiling models: Understanding computational bottlenecks
- Performance tuning: Comparing different implementations
The profiler captures timing for all tensor operations within the context, including GPU kernel execution, memory allocation, and data transfers. Use as a context manager to enable/disable profiling automatically.
- GPU profiling: WebGPU profiling is limited; use for relative comparisons
- Overhead: Profiling adds overhead; disable for production training
- Context manager: Can be used with enable()/disable() or start()/stop()
- Performance impact: Profiling slows down execution significantly
- Memory usage: Stores timing data for all operations (can be large)
- GPU limitations: Some WebGPU metrics may not be available
Parameters
optionsProfilerOptionsoptional- Profiler configuration: -
use_cuda: Enable CUDA-specific profiling (no-op for WebGPU) -use_cpu: Enable CPU profiling -use_kineto: Enable Kineto backend (no-op for WebGPU) -record_shapes: Record tensor shapes for each operation -with_stack: Include Python stack traces (limited in JavaScript) -with_flops: Estimate FLOPs for operations (approximate)
Returns
ProfilerContext– ProfilerContext object to use with context manager patternExamples
// Profile a forward pass
const profiler = torch.profiler.profile({ use_cpu: true });
profiler.enable();
model.forward(x).backward();
profiler.disable();
console.log(profiler.table()); // Print timing summary// Context manager pattern (recommended)
const profiler = torch.profiler.profile({ record_shapes: true });
profiler.start();
// Operations here are profiled
const output = model.forward(x);
output.sum().backward();
profiler.stop();
console.log(profiler.key_averages());// Compare layer performance
for (const layer of model.layers) {
const profiler = torch.profiler.profile({ use_cpu: true });
profiler.start();
layer.forward(x);
profiler.stop();
console.log(`Layer ${layer.name}: ${profiler.total_time()}ms`);
}See Also
- PyTorch torch.autograd.profiler.profile()
- emit_nvtx - NVIDIA-specific profiling markers
- ProfilerContext - The returned profiler object