torch.nn.functional.avg_pool3d
function avg_pool3d(input: Tensor, kernel_size: number | [number, number, number], options?: AvgPool3dFunctionalOptions): Tensorfunction avg_pool3d(input: Tensor, kernel_size: number | [number, number, number], stride: number | [number, number, number] | null, padding: number | [number, number, number], ceil_mode: boolean, count_include_pad: boolean, divisor_override: number | undefined, options?: AvgPool3dFunctionalOptions): Tensor3D Average Pooling: downsamples volumetric data by averaging values.
Applies average pooling over 3D spatial dimensions (depth, height, width) using sliding windows. Computes the mean value in each window, useful for:
- Medical imaging: smoothing CT/MRI scans while reducing resolution
- Video processing: temporal-spatial averaging for feature aggregation
- 3D feature aggregation: combining neighboring activations in volumetric networks
- Noise reduction: averaging reduces noise more effectively than max pooling
- Global feature extraction: combining 3D features before classification
- Smoothing volumetric data: low-pass filtering effect in 3D space
Unlike max pooling which preserves peaks, average pooling smooths volumetric data.
Operates on 5D inputs: (batch, channels, depth, height, width). The count_include_pad
parameter controls whether padding is counted in the averaging.
- Smoothing effect: Average pooling acts like low-pass filtering in 3D
- count_include_pad impact: Affects boundary values near padding
- Gradient distribution: Gradients spread equally to all elements in window
- Noise reduction: Better than max for noise filtering applications
- Signal preservation: Average better preserves overall signal magnitude than max
- Boundary handling: With count_include_pad=true, padded regions reduce averages
- Memory usage: 3D averaging is computationally expensive for large volumes
- Information loss: Averaging may blur fine details in volumetric data
Parameters
inputTensor- 5D input tensor of shape (batch, channels, depth, height, width)
kernel_sizenumber | [number, number, number]- Size of pooling window: single value or [depth, height, width]
optionsAvgPool3dFunctionalOptionsoptional
Returns
Tensor– Tensor with shape (batch, channels, out_depth, out_height, out_width) where: out_depth = floor((depth + 2*pad_d - kernel_d) / stride_d) + 1 out_height = floor((height + 2*pad_h - kernel_h) / stride_h) + 1 out_width = floor((width + 2*pad_w - kernel_w) / stride_w) + 1Examples
// Medical imaging: smooth and downsample MRI volume
const mri = torch.randn(1, 32, 128, 256, 256); // 1 scan, 32 filters, 128x256x256 volume
const smoothed = torch.nn.functional.avg_pool3d(mri, 2); // 2x2x2 averaging
// Output: (1, 32, 64, 128, 128) - reduced noise and resolution
// Video feature aggregation: temporal and spatial averaging
const features = torch.randn(8, 256, 8, 14, 14); // 8 videos, 256 features, 8 frames, 14x14 spatial
const aggregated = torch.nn.functional.avg_pool3d(features, [2, 2, 2], [2, 2, 2]);
// Output: (8, 256, 4, 7, 7) - combined over 2-frame temporal windows
// Global average pooling to output: reduce to single feature vector
const final_features = torch.randn(16, 512, 4, 4, 4); // 16 batches, 512 channels, 4³ spatial
const global_avg = torch.nn.functional.avg_pool3d(final_features, [4, 4, 4]);
// Output: (16, 512, 1, 1, 1) - global average of each feature map
// Asymmetric pooling: preserve temporal info, reduce spatial
const temporal_data = torch.randn(4, 128, 16, 32, 32); // temporal depth=16
const spatial_only = torch.nn.functional.avg_pool3d(temporal_data, [1, 2, 2], [1, 2, 2]);
// Output: (4, 128, 16, 16, 16) - no temporal averaging, spatial 2x reductionSee Also
- PyTorch torch.nn.functional.avg_pool3d
- max_pool3d - Max variant preserving peaks instead of smoothing
- avg_pool2d - 2D spatial average pooling
- adaptive_avg_pool3d - Adaptive averaging with automatic kernel/stride