torch.nn.functional.conv3d

function conv3d<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>, weight: Tensor): Tensor<Shape, D, Dev>

function conv3d<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>, weight: Tensor, bias: Tensor | null, stride: number | [number, number, number], padding: number | [number, number, number], dilation: number | [number, number, number], groups: number, options: Conv3dFunctionalOptions): Tensor<Shape, D, Dev>

Applies a 3D convolution over an input signal composed of several input planes.

3D convolution extends the standard 2D convolution to volumetric data, enabling spatial-temporal feature learning. Common applications include:

Video analysis (frames as depth dimension)
Medical imaging (CT scans, MRI volumes)
3D point cloud processing
Spatiotemporal action recognition

\begin{aligned} D_{out} = \lfloor \frac{D_{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel\_size} - 1) - 1}{\text{stride}} + 1 \rfloor \\ \text{(same formula applies to H and W dimensions)} \end{aligned}

For 4D input (C_in, D, H, W), a batch dimension is added and removed automatically.
Memory usage: 3D convolutions are memory-intensive. Consider using smaller batch sizes or spatial dimensions for large models.
Groups: Setting groups=C_in with C_out=C_in gives depthwise convolution.

Parameters

inputTensor<S, D, Dev>: Input tensor of shape (N, C_in, D, H, W) or (C_in, D, H, W)
weightTensor: Convolution filters of shape (C_out, C_in/groups, kD, kH, kW)

Returns

Tensor<Shape, D, Dev>– Output tensor of shape (N, C_out, D_out, H_out, W_out)

Examples

// Basic 3D convolution on video data
const video = torch.randn(1, 3, 16, 112, 112);  // (batch, channels, frames, H, W)
const weight = torch.randn(64, 3, 3, 7, 7);     // 64 filters, 3x7x7 kernel
const bias = torch.zeros(64);
const features = torch.nn.functional.conv3d(video, weight, bias, {
  stride: [1, 2, 2],
  padding: [1, 3, 3]
});
// Output: [1, 64, 16, 56, 56]

// Without bias
const out = torch.nn.functional.conv3d(input, weight, { stride: 2, padding: 1 });

// Grouped convolution for efficiency
const grouped = torch.nn.functional.conv3d(input, weight, { groups: 4 });

torch.nn.functional.conv3d

function conv3d<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>, weight: Tensor): Tensor<Shape, D, Dev>

function conv3d<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>, weight: Tensor, bias: Tensor | null, stride: number | [number, number, number], padding: number | [number, number, number], dilation: number | [number, number, number], groups: number, options: Conv3dFunctionalOptions): Tensor<Shape, D, Dev>

Applies a 3D convolution over an input signal composed of several input planes.

3D convolution extends the standard 2D convolution to volumetric data, enabling spatial-temporal feature learning. Common applications include:

Video analysis (frames as depth dimension)
Medical imaging (CT scans, MRI volumes)
3D point cloud processing
Spatiotemporal action recognition

\begin{aligned} D_{out} = \lfloor \frac{D_{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel\_size} - 1) - 1}{\text{stride}} + 1 \rfloor \\ \text{(same formula applies to H and W dimensions)} \end{aligned}

For 4D input (C_in, D, H, W), a batch dimension is added and removed automatically.
Memory usage: 3D convolutions are memory-intensive. Consider using smaller batch sizes or spatial dimensions for large models.
Groups: Setting groups=C_in with C_out=C_in gives depthwise convolution.

Parameters

inputTensor<S, D, Dev>: Input tensor of shape (N, C_in, D, H, W) or (C_in, D, H, W)
weightTensor: Convolution filters of shape (C_out, C_in/groups, kD, kH, kW)

Returns

Tensor<Shape, D, Dev>– Output tensor of shape (N, C_out, D_out, H_out, W_out)

Examples

// Basic 3D convolution on video data
const video = torch.randn(1, 3, 16, 112, 112);  // (batch, channels, frames, H, W)
const weight = torch.randn(64, 3, 3, 7, 7);     // 64 filters, 3x7x7 kernel
const bias = torch.zeros(64);
const features = torch.nn.functional.conv3d(video, weight, bias, {
  stride: [1, 2, 2],
  padding: [1, 3, 3]
});
// Output: [1, 64, 16, 56, 56]

// Without bias
const out = torch.nn.functional.conv3d(input, weight, { stride: 2, padding: 1 });

// Grouped convolution for efficiency
const grouped = torch.nn.functional.conv3d(input, weight, { groups: 4 });

torch.nn.functional.conv3d

Parameters

Returns

Examples

See Also

torch.nn.functional.conv3d

Parameters

Returns

Examples

See Also