torch.js has not been released yet.

torch.nn.functional.adaptive_avg_pool3d

function adaptive_avg_pool3d(input: Tensor, output_size: number | [number, number, number]): Tensor

3D Adaptive Average Pooling: averages to fixed volumetric size automatically.

Applies adaptive average pooling over 3D spatial dimensions (depth, height, width) with automatic kernel/stride computation. Useful for:

Medical imaging classification: standardizing volumetric features
Video classification: pooling temporal-spatial features to fixed size
3D object recognition: reducing 3D feature maps before classification
Variable-size 3D input handling: same network works with any volume size
Multi-modal learning: combining 3D features from different sources
Volumetric feature aggregation: before final classification layers

Unlike regular pooling, adaptive pooling automatically computes the kernel and stride to achieve the target output volumetric size. Averages values in each adaptive window.

3D adaptive kernels: Automatically computed for all three spatial dimensions
Flexible sizing: Can target any output spatial size
Input invariance: Same output regardless of input volume dimensions
Averaging smoothing: Reduces noise while reducing resolution
Asymmetric support: Can have different target sizes per dimension

Memory intensive: 3D operations use more memory than 2D
Computational cost: Adaptive 3D pooling more expensive than regular
Non-uniform kernels: May use different kernel sizes at different positions

Parameters

inputTensor: 5D input tensor of shape (batch, channels, depth, height, width)
output_sizenumber | [number, number, number]: Target spatial size: single value for (size, size, size) or [depth, height, width]

Returns

Tensor– Tensor with shape (batch, channels, out_depth, out_height, out_width)

Examples

// Medical imaging: 3D CNN classification
const ct_features = torch.randn(8, 512, 16, 14, 14);  // 3D CNN features
const pooled = torch.nn.functional.adaptive_avg_pool3d(ct_features, 1);
// Output: (8, 512, 1, 1, 1) - ready for classification head

// Variable volume sizes: same network for different resolutions
const small_vol = torch.randn(4, 256, 32, 32, 32);
const out1 = torch.nn.functional.adaptive_avg_pool3d(small_vol, 8);  // → (4, 256, 8, 8, 8)

const large_vol = torch.randn(4, 256, 64, 64, 64);  // 2x larger
const out2 = torch.nn.functional.adaptive_avg_pool3d(large_vol, 8);  // → (4, 256, 8, 8, 8)
// Both produce same output size regardless of input volume size

// Video action recognition: temporal-spatial pooling
const video_features = torch.randn(16, 1024, 8, 7, 7);  // 8 frames, 7x7 spatial
const standardized = torch.nn.functional.adaptive_avg_pool3d(video_features, [4, 7, 7]);
// Output: (16, 1024, 4, 7, 7) - temporal reduction only

// Asymmetric pooling: preserve some dimensions
const volume = torch.randn(2, 128, 32, 64, 64);
const reduced = torch.nn.functional.adaptive_avg_pool3d(volume, [8, 32, 32]);
// Output: (2, 128, 8, 32, 32) - only depth dimension reduced

See Also

PyTorch torch.nn.functional.adaptive_avg_pool3d
adaptive_max_pool3d - Max variant for feature saliency
avg_pool3d - Regular 3D average pooling with explicit kernel/stride
adaptive_avg_pool2d - 2D spatial variant
adaptive_avg_pool1d - 1D temporal variant

adaptive_avg_pool2d

adaptive_max_pool1d