torch.nn.MaxPool1d
new MaxPool1d(kernel_size: number, options?: MaxPool1dOptions)
- readonly
kernel_size(number) - readonly
stride(number) - readonly
padding(number) - readonly
return_indices(boolean)
1D max pooling: reduces sequence length by taking maximum over sliding window.
Applies max pooling over temporal sequences: scans a kernel over the sequence dimension, returning the maximum value within each kernel window. Reduces temporal dimensionality while preserving strongest signal. Essential for:
- Audio and signal processing (reducing temporal resolution)
- Time series compression (downsampling while preserving peaks)
- RNN/transformer input preprocessing (temporal pooling)
- Extracting dominant features from sequential data
Max pooling selects the strongest activations in each window, providing translation invariance and shift robustness. Unlike average pooling which smooths, max pooling keeps peaks sharp.
When to use MaxPool1d:
- Time series analysis (peaks matter more than average)
- Audio/speech processing (reduce sequence length)
- Signal processing with clear peaks
- RNN input preprocessing (temporal compression)
- When preserving maximum activations is important
Trade-offs:
- vs AvgPool: MaxPool preserves peaks; AvgPool smooths (averages)
- vs adaptive pooling: MaxPool fixed stride/kernel; adaptive auto-adjusts output size
- Output size: Controlled by kernel_size and stride explicitly
- Information loss: Some sequence information lost (only max per window kept)
- Gradient flow: Gradient only flows to max element in each window
Pooling mechanics: For a 1D sequence [T, B, C] (time, batch, channels):
- Slide kernel_size window over time dimension
- Step by stride (default: kernel_size for non-overlapping)
- Keep maximum value in each window
- Output: [T_out, B, C] where T_out = floor((T + 2*padding - kernel_size) / stride + 1)
- Default stride: stride=kernel_size gives non-overlapping pooling
- Stride kernel: Creates overlapping windows (smoother downsampling)
- Gradient: Only max element per window gets gradient; others get zero
- Indices: Useful for unpooling to reconstruct approximate original signal
- Deterministic: Given same input, always selects same indices (no randomness)
- Padding: Zero-padding added before pooling; affects output size calculation
- Gradient sparsity: Many neurons have zero gradient (not max in window)
- Information loss: Non-max values discarded; signal compression irreversible
- Output size: Calculate using formula to predict output dimensions
- Peak preservation: Can amplify noise if noise is peak in window
Examples
// Simple temporal pooling
const pool = new torch.nn.MaxPool1d(3); // kernel=3, stride=3 (non-overlapping)
const x = torch.randn([100, 32, 64]); // [time=100, batch=32, channels=64]
const y = pool.forward(x); // [time≈34, batch=32, channels=64]// Overlapping pooling with stride < kernel
const pool = new torch.nn.MaxPool1d(5, 2); // kernel=5, stride=2 (overlapping)
const x = torch.randn([100, 16, 128]);
const y = pool.forward(x); // Smooth downsampling with overlap// With indices for unpooling (e.g., deconvolution)
const pool = new torch.nn.MaxPool1d(2, 2, false, true); // return_indices=true
const x = torch.randn([100, 32, 64]);
const [y, indices] = pool.forward(x) as [torch.Tensor, torch.Tensor];
// indices tells where max came from, used for max_unpool