torch.js has not been released yet.

torch.nn.MaxPool1d

class MaxPool1d extends Module

new MaxPool1d(kernel_size: number, options?: MaxPool1dOptions)

readonlykernel_size(number)
readonlystride(number)
readonlypadding(number)
readonlyreturn_indices(boolean)

1D max pooling: reduces sequence length by taking maximum over sliding window.

Applies max pooling over temporal sequences: scans a kernel over the sequence dimension, returning the maximum value within each kernel window. Reduces temporal dimensionality while preserving strongest signal. Essential for:

Audio and signal processing (reducing temporal resolution)
Time series compression (downsampling while preserving peaks)
RNN/transformer input preprocessing (temporal pooling)
Extracting dominant features from sequential data

Max pooling selects the strongest activations in each window, providing translation invariance and shift robustness. Unlike average pooling which smooths, max pooling keeps peaks sharp.

When to use MaxPool1d:

Time series analysis (peaks matter more than average)
Audio/speech processing (reduce sequence length)
Signal processing with clear peaks
RNN input preprocessing (temporal compression)
When preserving maximum activations is important

Trade-offs:

vs AvgPool: MaxPool preserves peaks; AvgPool smooths (averages)
vs adaptive pooling: MaxPool fixed stride/kernel; adaptive auto-adjusts output size
Output size: Controlled by kernel_size and stride explicitly
Information loss: Some sequence information lost (only max per window kept)
Gradient flow: Gradient only flows to max element in each window

Pooling mechanics: For a 1D sequence [T, B, C] (time, batch, channels):

Slide kernel_size window over time dimension
Step by stride (default: kernel_size for non-overlapping)
Keep maximum value in each window
Output: [T_out, B, C] where T_out = floor((T + 2*padding - kernel_size) / stride + 1)

Default stride: stride=kernel_size gives non-overlapping pooling
Stride kernel: Creates overlapping windows (smoother downsampling)
Gradient: Only max element per window gets gradient; others get zero
Indices: Useful for unpooling to reconstruct approximate original signal
Deterministic: Given same input, always selects same indices (no randomness)
Padding: Zero-padding added before pooling; affects output size calculation

Gradient sparsity: Many neurons have zero gradient (not max in window)
Information loss: Non-max values discarded; signal compression irreversible
Output size: Calculate using formula to predict output dimensions
Peak preservation: Can amplify noise if noise is peak in window

Examples

// Simple temporal pooling
const pool = new torch.nn.MaxPool1d(3);  // kernel=3, stride=3 (non-overlapping)
const x = torch.randn([100, 32, 64]);    // [time=100, batch=32, channels=64]
const y = pool.forward(x);               // [time≈34, batch=32, channels=64]

// Overlapping pooling with stride < kernel
const pool = new torch.nn.MaxPool1d(5, 2);  // kernel=5, stride=2 (overlapping)
const x = torch.randn([100, 16, 128]);
const y = pool.forward(x);  // Smooth downsampling with overlap

// With indices for unpooling (e.g., deconvolution)
const pool = new torch.nn.MaxPool1d(2, 2, false, true);  // return_indices=true
const x = torch.randn([100, 32, 64]);
const [y, indices] = pool.forward(x) as [torch.Tensor, torch.Tensor];
// indices tells where max came from, used for max_unpool

See Also

PyTorch torch.nn.MaxPool1d

MarginRankingLossOptions

MaxPool1dOptions