torch.nn.MaxPool3d
new MaxPool3d(kernel_size: number | [number, number, number], options?: MaxPool3dOptions)
- readonly
kernel_size(number | [number, number, number]) - readonly
stride(number | [number, number, number]) - readonly
padding(number | [number, number, number]) - readonly
return_indices(boolean)
3D max pooling: reduces volumetric dimensions by taking maximum over sliding window.
Applies max pooling over 3D spatial data (volumetric): scans a kernel over depth, height, and width, returning the maximum value within each kernel window. Reduces 3D spatial dimensionality while preserving strongest activations. Essential for:
- Video understanding (3D CNNs for action recognition, temporal modeling)
- Medical imaging (CT scans, MRI volumetric analysis, 3D image segmentation)
- Point cloud processing (volumetric representations for 3D understanding)
- 3D object detection and reconstruction (volumetric CNNs)
- Downsampling 3D feature maps efficiently
Similar to MaxPool2d but extends to 3D spatial domain. Selects the strongest activations in each volumetric window, providing spatial and temporal translation invariance. Processes depth, height, and width dimensions independently but simultaneously.
When to use MaxPool3d:
- 3D CNNs for video/volumetric data (peaks in 3D space matter)
- Medical image analysis with 3D volumes
- When spatial structure needs to be preserved in 3D
- Reducing computational burden of 3D convolutions
- Standard in 3D deep learning architectures (I3D, C3D, etc.)
Trade-offs:
- vs AvgPool3d: MaxPool3d preserves peaks; AvgPool3d smooths
- vs adaptive pooling: MaxPool3d fixed stride/kernel; adaptive auto-adjusts
- Computational cost: More expensive than 2D pooling (3 spatial dimensions)
- Memory intensive: 3D windows consume more memory
- Information loss: Only max per 3D window kept
Pooling mechanics: For a 3D volume [B, C, D, H, W] (batch, channels, depth, height, width):
- For each channel independently:
- Slide kernel_size³ window over spatial dimensions (depth × height × width)
- Step by stride in all 3 directions (default: kernel_size for non-overlapping)
- Keep maximum value in each window
- Output: [B, C, D_out, H_out, W_out] where:
- D_out = floor((D + 2*padding_d - kernel_d) / stride_d) + 1
- H_out = floor((H + 2*padding_h - kernel_h) / stride_h) + 1
- W_out = floor((W + 2*padding_w - kernel_w) / stride_w) + 1
- Default stride: stride=kernel_size gives non-overlapping pooling
- Cubic kernels typical: [2,2,2] or [3,3,3] most common in 3D networks
- Gradient: Only max element per 3D window gets gradient
- Anisotropic kernels: Different D/H/W values useful for video/medical data
- Indices: Useful for unpooling to reconstruct 3D spatial layout
- Deterministic: Given same input, always selects same indices
- Channel independence: Each channel pooled independently
- Memory critical: 3D operations memory-intensive; consider batch size/spatial dims
- Padding: Zero-padding added in all 3 dimensions before pooling
- Memory intensive: 3D pooling uses significantly more memory than 2D
- Computational cost: Expensive operation; consider stride values carefully
- Batch size: Smaller batches may be needed due to 3D memory requirements
- Output size: Calculate using formula to predict output dimensions
- Information loss: 3D spatial information compression irreversible
Examples
// Video action recognition with 3D pooling
const pool = new torch.nn.MaxPool3d(2); // kernel=2x2x2, stride=2 (non-overlapping)
const x = torch.randn([8, 128, 16, 56, 56]); // [batch, channels, frames, height, width]
const y = pool.forward(x); // [8, 128, 8, 28, 28] - spatial & temporal dims halved// Medical imaging with anisotropic pooling (different D/H/W kernels)
const pool = new torch.nn.MaxPool3d([2, 3, 3], 2); // Smaller temporal kernel, larger spatial
const volume = torch.randn([4, 64, 32, 64, 64]); // [batch, channels, depth, height, width]
const y = pool.forward(volume); // Adaptive to medical data characteristics// 3D CNN for volumetric data (medical diagnosis)
const conv1 = new torch.nn.Conv3d(1, 32, 3, { padding: 1 });
const pool = new torch.nn.MaxPool3d([2, 2, 2], 2); // Cubic pooling
const x = torch.randn([4, 1, 64, 64, 64]); // 3D CT scan
let y = conv1.forward(x); // [4, 32, 64, 64, 64]
y = pool.forward(y); // [4, 32, 32, 32, 32] - half-resolution feature map// Temporal pooling emphasis (video temporal downsampling)
const pool = new torch.nn.MaxPool3d([1, 2, 2], [1, 2, 2]); // Only spatial pooling
const x = torch.randn([16, 64, 8, 56, 56]);
const y = pool.forward(x); // [16, 64, 8, 28, 28] - preserves temporal info// With indices for unpooling (3D deconvolution)
const pool = new torch.nn.MaxPool3d(2, 2, 0, true); // return_indices=true
const x = torch.randn([4, 64, 16, 16, 16]);
const [y, indices] = pool.forward(x) as [torch.Tensor, torch.Tensor];
// indices used for MaxUnpool3d reconstruction