torch.nn.functional.max_pool3d
function max_pool3d(input: Tensor, kernel_size: number | [number, number, number], options?: MaxPool3dFunctionalOptions): Tensorfunction max_pool3d(input: Tensor, kernel_size: number | [number, number, number], stride: number | [number, number, number], padding: number | [number, number, number], dilation: number | [number, number, number], ceil_mode: boolean, options?: MaxPool3dFunctionalOptions): Tensor3D Max Pooling: downsamples volumetric data by taking maximum values.
Applies max pooling over 3D spatial dimensions (depth, height, width) using sliding windows. Takes the maximum value in each window, useful for:
- Medical imaging: extracting features from CT/MRI scans (3D volumetric data)
- Video processing: downsampling video frames (depth = temporal dimension)
- 3D object recognition: feature extraction from volumetric point clouds
- Dimensionality reduction: reduces spatial resolution while preserving salient features
- 3D CNNs: key component of volumetric convolutional neural networks
- Sparsity enhancement: keeps only most prominent activations in 3D space
Operates on 5D inputs: (batch, channels, depth, height, width). Max pooling preserves the strongest signal in each 3D window, useful for detecting significant patterns.
- 3D locality: Captures patterns in 3D neighborhood (important for volumetric data)
- Computational cost: 3D operations are more expensive than 2D
- Kernel variants: Single kernel applies to all dims, or specify [D, H, W] for asymmetric
- Temporal use: Depth dimension often represents time (in video/sequential 3D data)
- Gradient flow: Only max elements in window receive backward gradients
- Memory intensive: 3D pooling on large volumes can be memory-heavy
- Asymmetric kernels: [1, k, k] for temporal preservation, [k, k, k] for isotropic
- Output shape calculation: Must account for all three spatial dimensions
Parameters
inputTensor- 5D input tensor of shape (batch, channels, depth, height, width)
kernel_sizenumber | [number, number, number]- Size of pooling window: single value or [depth, height, width]
optionsMaxPool3dFunctionalOptionsoptional
Returns
Tensor– Tensor with shape (batch, channels, out_depth, out_height, out_width) where: out_depth = floor((depth + 2*pad_d - kernel_d) / stride_d) + 1 out_height = floor((height + 2*pad_h - kernel_h) / stride_h) + 1 out_width = floor((width + 2*pad_w - kernel_w) / stride_w) + 1Examples
// Medical imaging: downsample CT scan
const ct_scan = torch.randn(2, 64, 64, 128, 128); // 2 scans, 64 channels, 64x128x128 volume
const pooled = torch.nn.functional.max_pool3d(ct_scan, 2); // 2x2x2 pooling
// Output: (2, 64, 32, 64, 64) - half resolution in each dimension
// Video feature extraction: temporal + spatial pooling
const video = torch.randn(8, 128, 16, 224, 224); // 8 videos, 128 filters, 16 frames, 224x224 resolution
const downsampled = torch.nn.functional.max_pool3d(video, [2, 2, 2], [2, 2, 2]);
// Reduce frames from 16→8, spatial 224→112. Key for 3D CNNs like C3D
// 3D object recognition: isotropic pooling
const volume = torch.randn(16, 256, 14, 14, 14); // 16 batches, 256 feature maps, 14³ spatial
const reduced = torch.nn.functional.max_pool3d(volume, 2, 2);
// Output: (16, 256, 7, 7, 7) - half resolution
// Asymmetric pooling: different kernels for spatial vs temporal
const frames = torch.randn(4, 512, 8, 7, 7); // 4 clips, 512 features, 8 frames, 7x7 spatial
const features = torch.nn.functional.max_pool3d(frames, [1, 2, 2], [1, 2, 2]);
// No temporal pooling (kernel_d=1), but spatial 2x pooling. Good for temporal sequence modelsSee Also
- PyTorch torch.nn.functional.max_pool3d
- avg_pool3d - Averaging alternative for smoother reduction
- max_pool2d - 2D spatial pooling variant
- adaptive_max_pool3d - Adaptive pooling with automatic kernel/stride