torch.nn.functional.avg_pool2d
function avg_pool2d(input: Tensor, kernel_size: number | [number, number], options?: AvgPool2dFunctionalOptions): Tensorfunction avg_pool2d(input: Tensor, kernel_size: number | [number, number], stride: number | [number, number] | null, padding: number | [number, number], ceil_mode: boolean, count_include_pad: boolean, divisor_override: number | undefined, options?: AvgPool2dFunctionalOptions): Tensor2D Average Pooling: downsamples feature maps by averaging values.
Applies average pooling over 2D spatial dimensions (height, width) using sliding windows. Computes the mean value in each window, useful for:
- Smoother downsampling: preserves overall spatial information
- Global context: average pooling captures mean features across regions
- Dense predictions: less aggressive than max pooling
- Some modern architectures: alternative when edge-preservation not critical
- Stochastic regularization: randomness in average pooling aids generalization
When to use Average Pooling:
- When you need smoother, more global downsampling
- For dense prediction tasks (segmentation, depth estimation)
- Global average pooling for classification (channel-wise averaging)
- When max pooling is too aggressive
- Some modern architectures as alternative to max
Trade-offs vs Max Pooling:
- Global context: Average captures mean features, better for overall statistics
- Feature preservation: Keeps weaker signals (less selective than max)
- Noise sensitivity: Average can blur/average out strong features
- Empirical: Max pooling usually better for classification
- Modern trend: Max pooling still more common in practice
- Global average pooling: Alternative to flatten + linear, reduces parameters
- Padding sensitivity: count_include_pad affects behavior at image boundaries
- Smooth downsampling: Less aggressive than max pooling
- Less common: Max pooling more standard in modern CNNs
- Differentiable: Average operation is smooth, unlike max
Parameters
inputTensor- 4D input tensor [batch, channels, height, width]
kernel_sizenumber | [number, number]- Size of the pooling window (scalar or [height, width])
optionsAvgPool2dFunctionalOptionsoptional
Returns
Tensor– Pooled output tensor [batch, channels, out_height, out_width]Examples
// Standard average pooling for downsampling
const x = torch.randn([batch_size, 64, 32, 32]);
const pooled = torch.nn.functional.avg_pool2d(x, 2); // 2x2 pooling
// Output: [batch_size, 64, 16, 16] - spatial dims halved, values averaged// Global average pooling for classification
const features = torch.randn([batch_size, 512, 7, 7]); // After conv layers
const global_avg = torch.nn.functional.avg_pool2d(features, [7, 7]);
// Output: [batch_size, 512, 1, 1] - one average per channel
const flattened = global_avg.reshape([batch_size, 512]);// Exclude padding from average
const x = torch.randn([1, 3, 10, 10]);
const pooled = torch.nn.functional.avg_pool2d(x, 3, 1, 1, false);
// count_include_pad=false: only averages non-padded regionsSee Also
- PyTorch torch.nn.functional.avg_pool2d
- max_pool2d - Max pooling alternative (more selective)
- adaptive_avg_pool2d - Adaptive average pooling to fixed output size
- avg_pool1d - 1D variant for sequences
- avg_pool3d - 3D variant for volumetric data