torch.nn.AvgPool1d
new AvgPool1d(kernel_size: number, options?: AvgPool1dOptions)
- readonly
kernel_size(number) - readonly
stride(number) - readonly
padding(number)
1D average pooling: reduces sequence length by taking arithmetic mean over sliding window.
Applies average pooling over temporal sequences: scans a kernel over the sequence dimension, returning the arithmetic mean value within each kernel window. Reduces temporal dimensionality with smooth downsampling. Essential for:
- Audio and signal processing (smooth temporal reduction)
- Time series compression (downsampling while preserving signal shape)
- RNN/transformer input preprocessing (temporal smoothing)
- Feature aggregation from sequential data (averaging important patterns)
- Noise reduction in sequential representations
Average pooling smooths the signal by including all values in the window, unlike max pooling which keeps only peaks. Useful when average behavior is more important than peak detection.
When to use AvgPool1d:
- Time series analysis (shape matters more than peaks)
- Audio/speech processing (smooth temporal representation)
- Signal processing where noise reduction is important
- RNN input preprocessing (temporal smoothing)
- When preserving average activations is important
Trade-offs:
- vs MaxPool1d: AvgPool1d smooths; MaxPool1d preserves peaks
- vs adaptive pooling: AvgPool1d fixed stride/kernel; adaptive auto-adjusts output size
- Smoothness: Averaging reduces sharp features (may lose important peaks)
- Information preservation: More data retained per window than MaxPool
- Gradient flow: All elements in window contribute to gradients
Pooling mechanics: For a 1D sequence [T, B, C] (time, batch, channels):
- Slide kernel_size window over time dimension
- Step by stride (default: kernel_size for non-overlapping)
- Compute mean value in each window
- Output: [T_out, B, C] where T_out = floor((T + 2*padding - kernel_size) / stride + 1)
- Default stride: stride=kernel_size gives non-overlapping pooling
- Stride kernel: Creates overlapping windows (smoother downsampling)
- Gradient: All window elements get gradient (unlike MaxPool's sparse gradients)
- Smoothing effect: Averaging reduces sharp peaks and noise
- Padding: Zero-padding added before pooling; affects output size calculation
- Information retention: More information retained than MaxPool
- Deterministic: Given same input, always produces same output (no randomness)
- Peak loss: Sharp features smoothed out; max pooling better for peaks
- Noise amplification: If padding with zeros, can amplify boundary effects
- Output size: Calculate using formula to predict output dimensions
Examples
// Simple temporal averaging
const pool = new torch.nn.AvgPool1d(3); // kernel=3, stride=3 (non-overlapping)
const x = torch.randn([100, 32, 64]); // [time=100, batch=32, channels=64]
const y = pool.forward(x); // [time≈34, batch=32, channels=64]// Overlapping averaging with stride < kernel
const pool = new torch.nn.AvgPool1d(5, 2); // kernel=5, stride=2 (overlapping)
const x = torch.randn([100, 16, 128]);
const y = pool.forward(x); // Smooth downsampling with temporal overlap// Audio signal smoothing
const pool = new torch.nn.AvgPool1d(4, 2); // 4-sample averaging, stride 2
const audio = torch.randn([1, 16000]); // [batch, time_samples]
const smoothed = pool.forward(audio); // Noise-reduced temporal representation