torch.nn.functional.adaptive_avg_pool2d
2D Adaptive Average Pooling: averages to fixed spatial size automatically.
Applies adaptive average pooling over 2D spatial dimensions (height, width) with automatic kernel/stride computation. Useful for:
- ImageNet classification: standard pooling before final classifier
- Variable-size input handling: same network works with any image size
- Transfer learning: adapting pre-trained models to different resolutions
- Global feature extraction: reduces spatial dimensions before fully connected layers
- Multi-scale architectures: pooling to standardized sizes
- Flexible network design: input-size-independent architectures
Unlike regular pooling, adaptive pooling automatically computes the kernel and stride to achieve the target output spatial size. Averages values in each adaptive window.
- Automatic kernel computation: Kernel/stride computed automatically from input/output sizes
- Aspect ratio preserving: Can specify different heights and widths
- Input invariance: Same output size regardless of input spatial dimensions
- Smoothing effect: Averaging reduces noise while reducing resolution
- Non-overlapping windows: Typically uses non-overlapping kernels
- Non-uniform kernels: May use different kernel sizes at different positions
- Not differentiable everywhere: Boundaries may have slight gradient discontinuities
- Output size limits: Should not exceed input dimensions
Parameters
inputTensor- 4D input tensor of shape (batch, channels, height, width)
output_sizenumber | [number, number]- Target spatial size: single value for (size, size) or [height, width]
Returns
Tensor– Tensor with shape (batch, channels, out_height, out_width)Examples
// ImageNet classification: standard global average pooling
const features = torch.randn(32, 2048, 7, 7); // ResNet50 feature maps
const pooled = torch.nn.functional.adaptive_avg_pool2d(features, 1);
// Output: (32, 2048, 1, 1) - ready for final fully connected layer
// Variable input sizes: same network for different image resolutions
const img_small = torch.randn(4, 512, 56, 56);
const out1 = torch.nn.functional.adaptive_avg_pool2d(img_small, 7); // → (4, 512, 7, 7)
const img_large = torch.randn(4, 512, 112, 112); // 2x larger spatial resolution
const out2 = torch.nn.functional.adaptive_avg_pool2d(img_large, 7); // → (4, 512, 7, 7)
// Both produce (4, 512, 7, 7) regardless of input resolution
// Object detection: multi-scale feature pooling
const p5_features = torch.randn(8, 256, 14, 14); // FPN level P5
const standardized = torch.nn.functional.adaptive_avg_pool2d(p5_features, [7, 7]);
// Output: (8, 256, 7, 7) - standardized feature grid
// Arbitrary target sizes: not just powers of 2
const features = torch.randn(16, 256, 100, 100);
const resized = torch.nn.functional.adaptive_avg_pool2d(features, [10, 10]);
// Output: (16, 256, 10, 10) - 10x reductionSee Also
- PyTorch torch.nn.functional.adaptive_avg_pool2d
- adaptive_max_pool2d - Max variant for feature saliency
- avg_pool2d - Regular 2D average pooling with explicit kernel/stride
- adaptive_avg_pool1d - 1D variant for sequences
- adaptive_avg_pool3d - 3D variant for volumetric data