torch.nn.functional.interpolate

function interpolate<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>, options?: InterpolateFunctionalOptions): Tensor<S, D, Dev>

function interpolate<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>, size: number | number[] | undefined, scale_factor: number | number[] | undefined, mode:
| 'nearest'
| 'linear'
| 'bilinear'
| 'trilinear'
| 'bicubic'
| 'area'
| 'nearest-exact'
| undefined, align_corners: boolean | undefined, recompute_scale_factor: boolean | undefined, antialias: boolean, options?: InterpolateFunctionalOptions): Tensor<S, D, Dev>

Resample (upsample or downsample) spatial dimensions of a tensor to new sizes using interpolation.

Changes spatial resolution by computing new values at interpolated positions. Essential for:

Upsampling in generative models (GANs, diffusion, VAEs) - often called "deconvolution"
Downsampling for reduced memory/computation (feature pyramids, image pyramids)
Resizing variable-input images to fixed model dimensions (preprocessing)
Multi-scale architectures (FPN, U-Net) where different layers process different resolutions
Arbitrary-size output (e.g., resize to specific width x height)
Video processing (resizing frames while preserving temporal dimension)

Interpolation Modes: Different methods for computing values at new positions:

'nearest': Repeat nearest value (fast, blocky artifacts, good for discrete data)
'linear': Linear interpolation along dimension (smooth, 1D sequences)
'bilinear': 2D linear interpolation (standard for images, smooth gradients)
'bicubic': 2D cubic interpolation (higher-order, smoother than bilinear, slower)
'trilinear': 3D linear interpolation (for volumetric/video data)

Two Ways to Specify Output Size:

Absolute size: size=[H, W] - resize to exact dimensions
Scale factor: scale_factor=2 - multiply all spatial dims by 2 (2x upsample or 0.5x downsample)

\begin{aligned} \text{Bilinear interpolation: } y = (1-u)(1-v) f(x_1,y_1) + u(1-v) f(x_2,y_1) + (1-u)v f(x_1,y_2) + uv f(x_2,y_2) \\ \text{where } u,v \in [0,1] \text{ are fractional coordinates between grid points} \\ \text{Output shape} = [N, C, \text{size}_1, \text{size}_2, ...] \text{ if size is provided} \\ \text{Output shape} = [N, C, \text{scale\_factor} \cdot \text{input.shape}_3, ...] \text{ if scale\_factor is provided} \end{aligned}

Bilinear is standard for images: Most common choice for image interpolation. Smooth gradients, good perceptual quality, reasonable speed.
Nearest for feature maps: Often used for upsampling feature maps (not final output). Simpler, faster, works well in learned feature space.
Mode matters for gradients: Different modes have different gradient properties. Bilinear/bicubic smooth gradients; nearest has piecewise-constant gradients.
align_corners=False standard: Modern PyTorch default; aligns pixel centers. align_corners=True older behavior; corner-to-corner alignment (usually worse).
Upsampling needs post-processing: After upsampling with nearest/bilinear alone, blocky artifacts common. Usually follow with learned convolution (U-Net, GANs) for refinement.
Downsampling loses information: Information is permanently lost; can't recover original. Consider whether lower resolution is acceptable for task.
Non-integer scales: Can use fractional scale_factor (e.g., 1.5x) or non-square sizes. Useful for aspect ratio changes or arbitrary resolutions.
Batch dimension ignored: First two dimensions (batch, channels) never resampled, only spatial dimensions (from 3rd onward) are affected.

Specify either size OR scale_factor, not both: Providing both is confusing and error-prone. Only one should be provided (other can be undefined).
Downsampling with nearest can alias: May miss high-frequency details. Use anti-aliasing preprocessing for quality downsampling of images.
Large upsampling is wasteful: 8x or more upsampling wastes computation. Usually better to gradually upsample (4x → 2x) or use learned operations.
CPU very slow: CPU interpolation much slower than GPU. For large-scale inference, GPU is critical.
Bicubic only 2D: Bicubic mode doesn't work with 3D volumes (use trilinear instead). Mode and input dimensionality must match.
Floating-point interpolation: Output is always floating-point, even if input was integer.

Parameters

inputTensor<S, D, Dev>: Tensor to resample. Shape [batch, channels, ...spatial_dims] where spatial_dims are typically [H, W] (2D) or [D, H, W] (3D).
optionsInterpolateFunctionalOptionsoptional

Returns

Tensor<S, D, Dev>– Resampled tensor with new spatial dimensions. Batch and channel dimensions unchanged.

Examples

// 2x upsampling: common in super-resolution or generative models
const low_res_img = torch.randn(1, 3, 32, 32);        // [batch=1, channels=3, height=32, width=32]
const upsampled = torch.nn.functional.interpolate(low_res_img, undefined, 2, 'bilinear');
// Output: [1, 3, 64, 64] - each spatial dim multiplied by 2

// Resize to exact dimensions: standardize input sizes for model
const variable_size_img = torch.randn(1, 3, 480, 640);   // Random size image
const resized = torch.nn.functional.interpolate(variable_size_img, [256, 256], undefined, 'bilinear');
// Output: [1, 3, 256, 256] - always produces 256x256 output regardless of input size

// Batch of images with different sizes, all resized to same dimensions
const batch_imgs = torch.randn(32, 3, 640, 480);     // [batch=32, channels=3, height=640, width=480]
const standardized = torch.nn.functional.interpolate(batch_imgs, [224, 224], undefined, 'bilinear', false);
// Output: [32, 3, 224, 224] - all images resized to ImageNet standard size

// 0.5x downsampling: reduce memory/computation for efficiency
const feature_map = torch.randn(8, 256, 56, 56);     // Feature pyramid level
const downsampled = torch.nn.functional.interpolate(feature_map, undefined, 0.5, 'nearest');
// Output: [8, 256, 28, 28] - spatial dimensions halved (1/4 of pixels)

// Asymmetric upsampling: different scales per dimension
const x = torch.randn(1, 64, 16, 16);                // [channels=64, height=16, width=16]
const scaled = torch.nn.functional.interpolate(x, undefined, [2, 4], 'bilinear');
// Output: [1, 64, 32, 64] - 2x in height, 4x in width (non-square aspect ratio change)

// Video upsampling: preserve temporal dimension, upsample spatial
const video = torch.randn(1, 3, 30, 480, 640);       // [batch, channels, frames, height, width]
const upsampled_video = torch.nn.functional.interpolate(video, [960, 1280], undefined, 'trilinear');
// Output: [1, 3, 30, 960, 1280] - frames unchanged, spatial doubled (2x resolution)

// GAN generator: progressively upsample from low-res feature map
const z_feature = torch.randn(1, 512, 4, 4);                    // Latent feature [4x4]
const up1 = torch.nn.functional.interpolate(z_feature, [8, 8], undefined, 'nearest');    // [8x8]
const up2 = torch.nn.functional.interpolate(up1, [16, 16], undefined, 'nearest');        // [16x16]
const up3 = torch.nn.functional.interpolate(up2, [32, 32], undefined, 'nearest');        // [32x32]
// Progressive upsampling: 4→8→16→32 with convolutions at each scale

torch.nn.functional.interpolate

Parameters

Returns

Examples

See Also

torch.nn.functional.interpolate

Parameters

Returns

Examples

See Also