torch.js has not been released yet.

torch.nn.AvgPool1d

class AvgPool1d extends Module

new AvgPool1d(kernel_size: number, options?: AvgPool1dOptions)

readonlykernel_size(number)
readonlystride(number)
readonlypadding(number)

1D average pooling: reduces sequence length by taking arithmetic mean over sliding window.

Applies average pooling over temporal sequences: scans a kernel over the sequence dimension, returning the arithmetic mean value within each kernel window. Reduces temporal dimensionality with smooth downsampling. Essential for:

Audio and signal processing (smooth temporal reduction)
Time series compression (downsampling while preserving signal shape)
RNN/transformer input preprocessing (temporal smoothing)
Feature aggregation from sequential data (averaging important patterns)
Noise reduction in sequential representations

Average pooling smooths the signal by including all values in the window, unlike max pooling which keeps only peaks. Useful when average behavior is more important than peak detection.

When to use AvgPool1d:

Time series analysis (shape matters more than peaks)
Audio/speech processing (smooth temporal representation)
Signal processing where noise reduction is important
RNN input preprocessing (temporal smoothing)
When preserving average activations is important

Trade-offs:

vs MaxPool1d: AvgPool1d smooths; MaxPool1d preserves peaks
vs adaptive pooling: AvgPool1d fixed stride/kernel; adaptive auto-adjusts output size
Smoothness: Averaging reduces sharp features (may lose important peaks)
Information preservation: More data retained per window than MaxPool
Gradient flow: All elements in window contribute to gradients

Pooling mechanics: For a 1D sequence [T, B, C] (time, batch, channels):

Slide kernel_size window over time dimension
Step by stride (default: kernel_size for non-overlapping)
Compute mean value in each window
Output: [T_out, B, C] where T_out = floor((T + 2*padding - kernel_size) / stride + 1)

Default stride: stride=kernel_size gives non-overlapping pooling
Stride kernel: Creates overlapping windows (smoother downsampling)
Gradient: All window elements get gradient (unlike MaxPool's sparse gradients)
Smoothing effect: Averaging reduces sharp peaks and noise
Padding: Zero-padding added before pooling; affects output size calculation
Information retention: More information retained than MaxPool
Deterministic: Given same input, always produces same output (no randomness)

Peak loss: Sharp features smoothed out; max pooling better for peaks
Noise amplification: If padding with zeros, can amplify boundary effects
Output size: Calculate using formula to predict output dimensions

Examples

// Simple temporal averaging
const pool = new torch.nn.AvgPool1d(3);  // kernel=3, stride=3 (non-overlapping)
const x = torch.randn([100, 32, 64]);    // [time=100, batch=32, channels=64]
const y = pool.forward(x);               // [time≈34, batch=32, channels=64]

// Overlapping averaging with stride < kernel
const pool = new torch.nn.AvgPool1d(5, 2);  // kernel=5, stride=2 (overlapping)
const x = torch.randn([100, 16, 128]);
const y = pool.forward(x);  // Smooth downsampling with temporal overlap

// Audio signal smoothing
const pool = new torch.nn.AvgPool1d(4, 2);  // 4-sample averaging, stride 2
const audio = torch.randn([1, 16000]);       // [batch, time_samples]
const smoothed = pool.forward(audio);  // Noise-reduced temporal representation

See Also

PyTorch torch.nn.AvgPool1d

AvgPool1dOptions