torch.nn.functional.adaptive_avg_pool3d
3D Adaptive Average Pooling: averages to fixed volumetric size automatically.
Applies adaptive average pooling over 3D spatial dimensions (depth, height, width) with automatic kernel/stride computation. Useful for:
- Medical imaging classification: standardizing volumetric features
- Video classification: pooling temporal-spatial features to fixed size
- 3D object recognition: reducing 3D feature maps before classification
- Variable-size 3D input handling: same network works with any volume size
- Multi-modal learning: combining 3D features from different sources
- Volumetric feature aggregation: before final classification layers
Unlike regular pooling, adaptive pooling automatically computes the kernel and stride to achieve the target output volumetric size. Averages values in each adaptive window.
- 3D adaptive kernels: Automatically computed for all three spatial dimensions
- Flexible sizing: Can target any output spatial size
- Input invariance: Same output regardless of input volume dimensions
- Averaging smoothing: Reduces noise while reducing resolution
- Asymmetric support: Can have different target sizes per dimension
- Memory intensive: 3D operations use more memory than 2D
- Computational cost: Adaptive 3D pooling more expensive than regular
- Non-uniform kernels: May use different kernel sizes at different positions
Parameters
inputTensor- 5D input tensor of shape (batch, channels, depth, height, width)
output_sizenumber | [number, number, number]- Target spatial size: single value for (size, size, size) or [depth, height, width]
Returns
Tensor– Tensor with shape (batch, channels, out_depth, out_height, out_width)Examples
// Medical imaging: 3D CNN classification
const ct_features = torch.randn(8, 512, 16, 14, 14); // 3D CNN features
const pooled = torch.nn.functional.adaptive_avg_pool3d(ct_features, 1);
// Output: (8, 512, 1, 1, 1) - ready for classification head
// Variable volume sizes: same network for different resolutions
const small_vol = torch.randn(4, 256, 32, 32, 32);
const out1 = torch.nn.functional.adaptive_avg_pool3d(small_vol, 8); // → (4, 256, 8, 8, 8)
const large_vol = torch.randn(4, 256, 64, 64, 64); // 2x larger
const out2 = torch.nn.functional.adaptive_avg_pool3d(large_vol, 8); // → (4, 256, 8, 8, 8)
// Both produce same output size regardless of input volume size
// Video action recognition: temporal-spatial pooling
const video_features = torch.randn(16, 1024, 8, 7, 7); // 8 frames, 7x7 spatial
const standardized = torch.nn.functional.adaptive_avg_pool3d(video_features, [4, 7, 7]);
// Output: (16, 1024, 4, 7, 7) - temporal reduction only
// Asymmetric pooling: preserve some dimensions
const volume = torch.randn(2, 128, 32, 64, 64);
const reduced = torch.nn.functional.adaptive_avg_pool3d(volume, [8, 32, 32]);
// Output: (2, 128, 8, 32, 32) - only depth dimension reducedSee Also
- PyTorch torch.nn.functional.adaptive_avg_pool3d
- adaptive_max_pool3d - Max variant for feature saliency
- avg_pool3d - Regular 3D average pooling with explicit kernel/stride
- adaptive_avg_pool2d - 2D spatial variant
- adaptive_avg_pool1d - 1D temporal variant