torch.nn.functional.conv_transpose2d

function conv_transpose2d<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>, weight: Tensor): Tensor<Shape, D, Dev>

function conv_transpose2d<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>, weight: Tensor, bias: Tensor | null, stride: number | [number, number], padding: number | [number, number], output_padding: number | [number, number], groups: number, dilation: number | [number, number], options: ConvTranspose2dFunctionalOptions): Tensor<Shape, D, Dev>

2D Transposed Convolution (Deconvolution): upsamples spatial dimensions with learned parameters.

Applies transposed 2D convolution for upsampling spatial dimensions. NOT true deconvolution (which inverts convolution), but learnable upsampling with a transposed kernel. Inverse operation of standard convolution in terms of spatial shape (stride increases spatial size). Essential for:

Generative models (GANs, VAEs) - learnable upsampling for image generation
Semantic segmentation (FCN, U-Net, DeepLab) - restoring spatial resolution
Super-resolution and image enhancement (SRGAN, upsampling for perceptual losses)
Video frame generation and temporal upsampling
Dense prediction tasks requiring high-resolution outputs
Replacing bilinear/nearest-neighbor upsampling with learned parameters

How transposed convolution works: Opposite of standard convolution: stride > 1 enlarges spatial dimensions instead of shrinking. Think of it as: spread input values over larger grid, apply convolution. Mathematically equivalent to padding input heavily, applying standard convolution with stride=1.

Key insight: Output size increases with stride (stride=2 roughly quadruples spatial area). Can learn fine-grain details during upsampling via kernel parameters. Better than fixed upsampling.

Common architectures:

GAN generators: Stack transposed convolutions for progressive upsampling
U-Net decoder: Transpose conv or concatenation + conv for skip connections
FCN decoder: Transposed convolutions to restore input resolution
Progressive GAN: Increasingly larger transposed conv layers (ProGAN)

Output size formula: output_size = (input_size - 1) * stride - 2 * padding + dilation * (kernel_size - 1) + output_padding + 1

\begin{aligned} \text{Transposed Convolution: learnable upsampling via spread and convolve} \\ \text{Output size: } (H_{in} - 1) \cdot stride_h - 2 \cdot pad_h + dil_h \cdot (k_h - 1) + out\_pad_h + 1 \\ \text{Key property: stride increases spatial dimensions (opposite of regular convolution)} \\ \text{Common: } stride=2, kernel=4, padding=1, output_padding=1 \text{ for clean } 2\times \text{ upsampling} \end{aligned}

Learnable upsampling: Better than fixed bilinear/nearest; learns detail patterns
Stride-determined expansion: stride=2 means ~4x spatial area, stride=3 means ~9x, etc.
Checkerboard artifacts: Can create patterns at stride kernel_size; mitigated by kernel_size ≥ stride
Output padding: Needed for stride 1 without ambiguity; most use stride=2, kernel=4, padding=1, output_padding=1
GAN standard: Transposed convolution is de-facto standard for GAN generators (not bilinear)
Gradient flow: Fully differentiable; gradients flow efficiently through upsampling

Checkerboard artifacts: Naive transposed conv (kernel stride) creates visual artifacts
Kernel-stride relationship: Use kernel_size ≥ stride to avoid artifacts (e.g., kernel=4 for stride=2)
Output size ambiguity: Multiple (padding, output_padding) pairs give same output size
Memory cost: Upsampling layers often memory-intensive; watch batch size
NOT true deconvolution: Doesn't invert convolution exactly; learnable but imperfect inverse
Dilation support limited: Some implementations don't support dilation 1 on transpose

Parameters

inputTensor<S, D, Dev>: Input tensor of shape [N, C_in, H, W] (typically smaller spatial dims) - N: batch size - C_in: number of input channels - H, W: input spatial height and width (usually small in decoder)
weightTensor: Learnable filter tensor of shape [C_in, C_out/groups, kH, kW] - C_in: input channels (matches input.shape[1]) - C_out/groups: output channels per group (total C_out = C_out_per_group * groups) - kH, kW: kernel height and width

Returns

Tensor<Shape, D, Dev>– Output tensor of shape [N, C_out, H_out, W_out] - H_out = (H_in - 1) * strideH - 2 * padH + dilH * (kH - 1) + output_padH + 1 - W_out = (W_in - 1) * strideW - 2 * padW + dilW * (kW - 1) + output_padW + 1

Examples

// GAN generator: learnable upsampling from latent code
const latent = torch.randn([32, 128, 4, 4]);           // [batch=32, channels=128, H=4, W=4]
const kernel = torch.randn([128, 256, 5, 5]);          // [in=128, out=256, kH=5, kW=5]
const upsampled = torch.nn.functional.conv_transpose2d(
  latent, kernel, undefined, 2, 2, 1
);
// stride=2: roughly 4x spatial expansion -> [32, 256, 8, 8]
// GAN stacks multiple layers: 4→8→16→32→64 for full resolution

// U-Net decoder: restore spatial resolution with learned features
const encoded = torch.randn([8, 512, 8, 8]);           // Encoder output
const weight = torch.randn([512, 256, 4, 4]);
const decoded = torch.nn.functional.conv_transpose2d(
  encoded, weight, undefined, 2, 1, 1
);
// Spatial dims: 8×8 → 16×16 with learned upsampling
// Skip connections concatenate high-res features from encoder

// Semantic segmentation: pixel-wise prediction upsampling
const features = torch.randn([4, 512, 16, 16]);        // Feature maps from encoder
// Multiple transpose conv layers restore resolution
let x = features;
for (let i = 0; i < 4; i++) {
  const kernel = torch.randn([x.shape[1], 256, 4, 4]); // Adaptive kernel
  x = torch.nn.functional.conv_transpose2d(x, kernel, undefined, 2, 1, 1);
}
// 16×16 → 32×32 → 64×64 → 128×128 → 256×256
// Final shape: [4, 256, 256, 256] for full-resolution segmentation

// Super-resolution: upscale low-res image to high-res
const lowres = torch.randn([1, 3, 32, 32]);            // Low-res input
const kernel2x = torch.randn([3, 32, 3, 3]);
const medres = torch.nn.functional.conv_transpose2d(lowres, kernel2x, undefined, 2, 1, 1);
// medres: [1, 32, 64, 64]

const kernel_final = torch.randn([32, 3, 3, 3]);
const highres = torch.nn.functional.conv_transpose2d(medres, kernel_final, undefined, 1, 1, 0);
// highres: [1, 3, 64, 64] - 2x super-resolved RGB image

// Progressive GAN: learnable hierarchical upsampling
const z = torch.randn([8, 512, 1, 1]);                 // Noise vector (1x1 spatial)
let x = z;
// Progressive upsampling: 1 → 2 → 4 → 8 → 16 → 32 → 64 (2x per layer)
const sizes = [2, 4, 8, 16, 32, 64];
for (const size of sizes) {
  const out_channels = size === 1 ? 512 : 256;
  const kernel = torch.randn([x.shape[1], out_channels, 4, 4]);
  x = torch.nn.functional.conv_transpose2d(x, kernel, undefined, 2, 1, 1);
}
// Final: [8, 256, 64, 64] high-resolution generated image

// Fractional upsampling: non-power-of-2 scaling via output_padding
const input = torch.randn([1, 64, 16, 16]);
const kernel = torch.randn([64, 128, 3, 3]);
// output_padding helps with stride choices for exact target sizes
const output = torch.nn.functional.conv_transpose2d(
  input, kernel, undefined, 2, 1, [1, 0]
);
// Careful size management for architectures requiring specific dimensions

torch.nn.functional.conv_transpose2d

Parameters

Returns

Examples

See Also

torch.nn.functional.conv_transpose2d

Parameters

Returns

Examples

See Also