torch.nn.MaxPool3d

class MaxPool3d extends Module

new MaxPool3d(kernel_size: number | [number, number, number], options?: MaxPool3dOptions)

readonlykernel_size(number | [number, number, number])
readonlystride(number | [number, number, number])
readonlypadding(number | [number, number, number])
readonlyreturn_indices(boolean)

3D max pooling: reduces volumetric dimensions by taking maximum over sliding window.

Applies max pooling over 3D spatial data (volumetric): scans a kernel over depth, height, and width, returning the maximum value within each kernel window. Reduces 3D spatial dimensionality while preserving strongest activations. Essential for:

Video understanding (3D CNNs for action recognition, temporal modeling)
Medical imaging (CT scans, MRI volumetric analysis, 3D image segmentation)
Point cloud processing (volumetric representations for 3D understanding)
3D object detection and reconstruction (volumetric CNNs)
Downsampling 3D feature maps efficiently

Similar to MaxPool2d but extends to 3D spatial domain. Selects the strongest activations in each volumetric window, providing spatial and temporal translation invariance. Processes depth, height, and width dimensions independently but simultaneously.

When to use MaxPool3d:

3D CNNs for video/volumetric data (peaks in 3D space matter)
Medical image analysis with 3D volumes
When spatial structure needs to be preserved in 3D
Reducing computational burden of 3D convolutions
Standard in 3D deep learning architectures (I3D, C3D, etc.)

Trade-offs:

vs AvgPool3d: MaxPool3d preserves peaks; AvgPool3d smooths
vs adaptive pooling: MaxPool3d fixed stride/kernel; adaptive auto-adjusts
Computational cost: More expensive than 2D pooling (3 spatial dimensions)
Memory intensive: 3D windows consume more memory
Information loss: Only max per 3D window kept

Pooling mechanics: For a 3D volume [B, C, D, H, W] (batch, channels, depth, height, width):

For each channel independently:
Slide kernel_size³ window over spatial dimensions (depth × height × width)
Step by stride in all 3 directions (default: kernel_size for non-overlapping)
Keep maximum value in each window
Output: [B, C, D_out, H_out, W_out] where:
- D_out = floor((D + 2*padding_d - kernel_d) / stride_d) + 1
- H_out = floor((H + 2*padding_h - kernel_h) / stride_h) + 1
- W_out = floor((W + 2*padding_w - kernel_w) / stride_w) + 1

Default stride: stride=kernel_size gives non-overlapping pooling
Cubic kernels typical: [2,2,2] or [3,3,3] most common in 3D networks
Gradient: Only max element per 3D window gets gradient
Anisotropic kernels: Different D/H/W values useful for video/medical data
Indices: Useful for unpooling to reconstruct 3D spatial layout
Deterministic: Given same input, always selects same indices
Channel independence: Each channel pooled independently
Memory critical: 3D operations memory-intensive; consider batch size/spatial dims
Padding: Zero-padding added in all 3 dimensions before pooling

Memory intensive: 3D pooling uses significantly more memory than 2D
Computational cost: Expensive operation; consider stride values carefully
Batch size: Smaller batches may be needed due to 3D memory requirements
Output size: Calculate using formula to predict output dimensions
Information loss: 3D spatial information compression irreversible

Examples

// Video action recognition with 3D pooling
const pool = new torch.nn.MaxPool3d(2);  // kernel=2x2x2, stride=2 (non-overlapping)
const x = torch.randn([8, 128, 16, 56, 56]);  // [batch, channels, frames, height, width]
const y = pool.forward(x);  // [8, 128, 8, 28, 28] - spatial & temporal dims halved

// Medical imaging with anisotropic pooling (different D/H/W kernels)
const pool = new torch.nn.MaxPool3d([2, 3, 3], 2);  // Smaller temporal kernel, larger spatial
const volume = torch.randn([4, 64, 32, 64, 64]);  // [batch, channels, depth, height, width]
const y = pool.forward(volume);  // Adaptive to medical data characteristics

// 3D CNN for volumetric data (medical diagnosis)
const conv1 = new torch.nn.Conv3d(1, 32, 3, { padding: 1 });
const pool = new torch.nn.MaxPool3d([2, 2, 2], 2);  // Cubic pooling
const x = torch.randn([4, 1, 64, 64, 64]);  // 3D CT scan
let y = conv1.forward(x);  // [4, 32, 64, 64, 64]
y = pool.forward(y);  // [4, 32, 32, 32, 32] - half-resolution feature map

// Temporal pooling emphasis (video temporal downsampling)
const pool = new torch.nn.MaxPool3d([1, 2, 2], [1, 2, 2]);  // Only spatial pooling
const x = torch.randn([16, 64, 8, 56, 56]);
const y = pool.forward(x);  // [16, 64, 8, 28, 28] - preserves temporal info

// With indices for unpooling (3D deconvolution)
const pool = new torch.nn.MaxPool3d(2, 2, 0, true);  // return_indices=true
const x = torch.randn([4, 64, 16, 16, 16]);
const [y, indices] = pool.forward(x) as [torch.Tensor, torch.Tensor];
// indices used for MaxUnpool3d reconstruction

torch.nn.MaxPool3d

Examples

See Also

torch.nn.MaxPool3d

Examples

See Also