torch.nn.functional.interpolate
function interpolate<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>, options?: InterpolateFunctionalOptions): Tensor<S, D, Dev>function interpolate<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>, size: number | number[] | undefined, scale_factor: number | number[] | undefined, mode:
| 'nearest'
| 'linear'
| 'bilinear'
| 'trilinear'
| 'bicubic'
| 'area'
| 'nearest-exact'
| undefined, align_corners: boolean | undefined, recompute_scale_factor: boolean | undefined, antialias: boolean, options?: InterpolateFunctionalOptions): Tensor<S, D, Dev>Resample (upsample or downsample) spatial dimensions of a tensor to new sizes using interpolation.
Changes spatial resolution by computing new values at interpolated positions. Essential for:
- Upsampling in generative models (GANs, diffusion, VAEs) - often called "deconvolution"
- Downsampling for reduced memory/computation (feature pyramids, image pyramids)
- Resizing variable-input images to fixed model dimensions (preprocessing)
- Multi-scale architectures (FPN, U-Net) where different layers process different resolutions
- Arbitrary-size output (e.g., resize to specific width x height)
- Video processing (resizing frames while preserving temporal dimension)
Interpolation Modes: Different methods for computing values at new positions:
- 'nearest': Repeat nearest value (fast, blocky artifacts, good for discrete data)
- 'linear': Linear interpolation along dimension (smooth, 1D sequences)
- 'bilinear': 2D linear interpolation (standard for images, smooth gradients)
- 'bicubic': 2D cubic interpolation (higher-order, smoother than bilinear, slower)
- 'trilinear': 3D linear interpolation (for volumetric/video data)
Two Ways to Specify Output Size:
- Absolute size: size=[H, W] - resize to exact dimensions
- Scale factor: scale_factor=2 - multiply all spatial dims by 2 (2x upsample or 0.5x downsample)
- Bilinear is standard for images: Most common choice for image interpolation. Smooth gradients, good perceptual quality, reasonable speed.
- Nearest for feature maps: Often used for upsampling feature maps (not final output). Simpler, faster, works well in learned feature space.
- Mode matters for gradients: Different modes have different gradient properties. Bilinear/bicubic smooth gradients; nearest has piecewise-constant gradients.
- align_corners=False standard: Modern PyTorch default; aligns pixel centers. align_corners=True older behavior; corner-to-corner alignment (usually worse).
- Upsampling needs post-processing: After upsampling with nearest/bilinear alone, blocky artifacts common. Usually follow with learned convolution (U-Net, GANs) for refinement.
- Downsampling loses information: Information is permanently lost; can't recover original. Consider whether lower resolution is acceptable for task.
- Non-integer scales: Can use fractional scale_factor (e.g., 1.5x) or non-square sizes. Useful for aspect ratio changes or arbitrary resolutions.
- Batch dimension ignored: First two dimensions (batch, channels) never resampled, only spatial dimensions (from 3rd onward) are affected.
- Specify either size OR scale_factor, not both: Providing both is confusing and error-prone. Only one should be provided (other can be undefined).
- Downsampling with nearest can alias: May miss high-frequency details. Use anti-aliasing preprocessing for quality downsampling of images.
- Large upsampling is wasteful: 8x or more upsampling wastes computation. Usually better to gradually upsample (4x → 2x) or use learned operations.
- CPU very slow: CPU interpolation much slower than GPU. For large-scale inference, GPU is critical.
- Bicubic only 2D: Bicubic mode doesn't work with 3D volumes (use trilinear instead). Mode and input dimensionality must match.
- Floating-point interpolation: Output is always floating-point, even if input was integer.
Parameters
inputTensor<S, D, Dev>- Tensor to resample. Shape [batch, channels, ...spatial_dims] where spatial_dims are typically [H, W] (2D) or [D, H, W] (3D).
optionsInterpolateFunctionalOptionsoptional
Returns
Tensor<S, D, Dev>– Resampled tensor with new spatial dimensions. Batch and channel dimensions unchanged.Examples
// 2x upsampling: common in super-resolution or generative models
const low_res_img = torch.randn(1, 3, 32, 32); // [batch=1, channels=3, height=32, width=32]
const upsampled = torch.nn.functional.interpolate(low_res_img, undefined, 2, 'bilinear');
// Output: [1, 3, 64, 64] - each spatial dim multiplied by 2
// Resize to exact dimensions: standardize input sizes for model
const variable_size_img = torch.randn(1, 3, 480, 640); // Random size image
const resized = torch.nn.functional.interpolate(variable_size_img, [256, 256], undefined, 'bilinear');
// Output: [1, 3, 256, 256] - always produces 256x256 output regardless of input size
// Batch of images with different sizes, all resized to same dimensions
const batch_imgs = torch.randn(32, 3, 640, 480); // [batch=32, channels=3, height=640, width=480]
const standardized = torch.nn.functional.interpolate(batch_imgs, [224, 224], undefined, 'bilinear', false);
// Output: [32, 3, 224, 224] - all images resized to ImageNet standard size
// 0.5x downsampling: reduce memory/computation for efficiency
const feature_map = torch.randn(8, 256, 56, 56); // Feature pyramid level
const downsampled = torch.nn.functional.interpolate(feature_map, undefined, 0.5, 'nearest');
// Output: [8, 256, 28, 28] - spatial dimensions halved (1/4 of pixels)
// Asymmetric upsampling: different scales per dimension
const x = torch.randn(1, 64, 16, 16); // [channels=64, height=16, width=16]
const scaled = torch.nn.functional.interpolate(x, undefined, [2, 4], 'bilinear');
// Output: [1, 64, 32, 64] - 2x in height, 4x in width (non-square aspect ratio change)
// Video upsampling: preserve temporal dimension, upsample spatial
const video = torch.randn(1, 3, 30, 480, 640); // [batch, channels, frames, height, width]
const upsampled_video = torch.nn.functional.interpolate(video, [960, 1280], undefined, 'trilinear');
// Output: [1, 3, 30, 960, 1280] - frames unchanged, spatial doubled (2x resolution)
// GAN generator: progressively upsample from low-res feature map
const z_feature = torch.randn(1, 512, 4, 4); // Latent feature [4x4]
const up1 = torch.nn.functional.interpolate(z_feature, [8, 8], undefined, 'nearest'); // [8x8]
const up2 = torch.nn.functional.interpolate(up1, [16, 16], undefined, 'nearest'); // [16x16]
const up3 = torch.nn.functional.interpolate(up2, [32, 32], undefined, 'nearest'); // [32x32]
// Progressive upsampling: 4→8→16→32 with convolutions at each scaleSee Also
- PyTorch torch.nn.functional.interpolate
- pad - Padding (different from interpolation/resizing)
- conv_transpose2d - Learned upsampling with learnable kernels (often better than interpolate)
- max_pool2d - Downsampling with max pooling (learnable feature selection)
- avg_pool2d - Downsampling with average pooling
- adaptive_avg_pool2d - Automatic sizing to target output dimensions