torch.js has not been released yet.

torch.nn.functional.max_pool3d

function max_pool3d(input: Tensor, kernel_size: number | [number, number, number], options?: MaxPool3dFunctionalOptions): Tensor

function max_pool3d(input: Tensor, kernel_size: number | [number, number, number], stride: number | [number, number, number], padding: number | [number, number, number], dilation: number | [number, number, number], ceil_mode: boolean, options?: MaxPool3dFunctionalOptions): Tensor

3D Max Pooling: downsamples volumetric data by taking maximum values.

Applies max pooling over 3D spatial dimensions (depth, height, width) using sliding windows. Takes the maximum value in each window, useful for:

Medical imaging: extracting features from CT/MRI scans (3D volumetric data)
Video processing: downsampling video frames (depth = temporal dimension)
3D object recognition: feature extraction from volumetric point clouds
Dimensionality reduction: reduces spatial resolution while preserving salient features
3D CNNs: key component of volumetric convolutional neural networks
Sparsity enhancement: keeps only most prominent activations in 3D space

Operates on 5D inputs: (batch, channels, depth, height, width). Max pooling preserves the strongest signal in each 3D window, useful for detecting significant patterns.

3D locality: Captures patterns in 3D neighborhood (important for volumetric data)
Computational cost: 3D operations are more expensive than 2D
Kernel variants: Single kernel applies to all dims, or specify [D, H, W] for asymmetric
Temporal use: Depth dimension often represents time (in video/sequential 3D data)
Gradient flow: Only max elements in window receive backward gradients

Memory intensive: 3D pooling on large volumes can be memory-heavy
Asymmetric kernels: [1, k, k] for temporal preservation, [k, k, k] for isotropic
Output shape calculation: Must account for all three spatial dimensions

Parameters

inputTensor: 5D input tensor of shape (batch, channels, depth, height, width)
kernel_sizenumber | [number, number, number]: Size of pooling window: single value or [depth, height, width]
optionsMaxPool3dFunctionalOptionsoptional

Returns

Tensor– Tensor with shape (batch, channels, out_depth, out_height, out_width) where: out_depth = floor((depth + 2*pad_d - kernel_d) / stride_d) + 1 out_height = floor((height + 2*pad_h - kernel_h) / stride_h) + 1 out_width = floor((width + 2*pad_w - kernel_w) / stride_w) + 1

Examples

// Medical imaging: downsample CT scan
const ct_scan = torch.randn(2, 64, 64, 128, 128);  // 2 scans, 64 channels, 64x128x128 volume
const pooled = torch.nn.functional.max_pool3d(ct_scan, 2);  // 2x2x2 pooling
// Output: (2, 64, 32, 64, 64) - half resolution in each dimension

// Video feature extraction: temporal + spatial pooling
const video = torch.randn(8, 128, 16, 224, 224);  // 8 videos, 128 filters, 16 frames, 224x224 resolution
const downsampled = torch.nn.functional.max_pool3d(video, [2, 2, 2], [2, 2, 2]);
// Reduce frames from 16→8, spatial 224→112. Key for 3D CNNs like C3D

// 3D object recognition: isotropic pooling
const volume = torch.randn(16, 256, 14, 14, 14);  // 16 batches, 256 feature maps, 14³ spatial
const reduced = torch.nn.functional.max_pool3d(volume, 2, 2);
// Output: (16, 256, 7, 7, 7) - half resolution

// Asymmetric pooling: different kernels for spatial vs temporal
const frames = torch.randn(4, 512, 8, 7, 7);  // 4 clips, 512 features, 8 frames, 7x7 spatial
const features = torch.nn.functional.max_pool3d(frames, [1, 2, 2], [1, 2, 2]);
// No temporal pooling (kernel_d=1), but spatial 2x pooling. Good for temporal sequence models

See Also

PyTorch torch.nn.functional.max_pool3d
avg_pool3d - Averaging alternative for smoother reduction
max_pool2d - 2D spatial pooling variant
adaptive_max_pool3d - Adaptive pooling with automatic kernel/stride

max_pool2d_with_indices

max_pool3d_with_indices