torch.nn.functional.dropout2d

function dropout2d<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>, options?: DropoutFunctionalOptions): Tensor<S, D, Dev>

function dropout2d<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>, p: number, training: boolean, inplace: boolean, options?: DropoutFunctionalOptions): Tensor<S, D, Dev>

Randomly zeros out entire channels in 2D input (spatial feature maps).

Applies channel-wise dropout for 2D spatial data (images). Instead of dropping individual pixels, dropout2d drops entire feature maps consistently across all spatial positions. This preserves spatial coherence and is the standard regularization for 2D CNNs. Each channel (feature map) is either completely dropped or completely kept, with the same mask applied to all height and width positions. This is more effective than element-wise dropout for images where local spatial structure matters.

Common use cases:

2D CNNs for images: Drop entire filters consistently across spatial dimensions
Convolutional layer regularization: Standard dropout for image classification networks
Feature map adaptation prevention: Prevents co-adaptation of spatially-local features
Image segmentation: Regularize intermediate feature maps preserving spatial relationships
Object detection: Feature dropout in detection backbones maintaining localization
Medical imaging: Regularize 2D slices in medical image analysis
Style transfer networks: Dropout on content and style feature maps

The key advantage over element-wise dropout:

Spatial coherence: Neighboring pixels are treated consistently
Stronger regularization: Entire learned features dropped, not just individual values
Biologically plausible: Mirrors how neurons in cortex are organized into columns
Better for images: Exploits 2D structure of image data

Dimensionality handling:

1D input (L,): Element-wise dropout applied
2D input (H, W): Channel dropout treating first dim as channels
3D input (C, H, W): Channel dropout per channel
4D input (N, C, H, W): Batch-wise channel dropout (different mask per batch)
Higher dims: Falls back to element-wise dropout

\begin{aligned} \text{scale} = \frac{1}{1 - p} \quad \text{(scaling factor for kept channels)} \\ \text{mask}[c] = \begin{cases} 1 & \text{with probability } 1-p \\ 0 & \text{with probability } p \end{cases} \\ \text{output}[n,c,:,:] = \text{mask}[c] \cdot \text{scale} \cdot \text{input}[n,c,:,:] \end{aligned}

Spatial consistency: The same channel is either dropped or kept across all H×W spatial positions. All pixels of a feature map share the same fate
Channel as feature: Each channel represents a learned feature/filter. Dropping entire channels prevents specific features from being used, forcing robust learning
Scaling factor: Kept channels are automatically scaled by 1/(1-p) to maintain expected value. This is inverted dropout - no rescaling needed at inference
Independent per batch: With 4D input (N, C, H, W), each image gets an independent random channel mask, improving regularization effectiveness
Standard for CNNs: dropout2d is the de facto standard for image CNNs. Use this over element-wise dropout for 2D spatial data
Deterministic eval: Always set training=false during inference for reproducible results
Color channel handling: For RGB images, channels 0-2 are treated like any others. Typically dropout is applied after convolution, not directly to RGB

Don't mix with batch_norm carelessly: Using both together can cause training instability. Modern practice often uses either batch_norm or dropout, not both heavily
Scaling is automatic: The 1/(1-p) scaling is applied automatically during training. Don't manually rescale - that would cause incorrect statistics
High dropout rates: Rates above 0.5 are uncommon and usually indicate too weak of a model. Consider regularizing through model size or L2 regularization instead
Channel dimension position: This function assumes channels are at dimension 1. If your data has channels elsewhere, use permute/transpose first
3D CNNs need dropout3d: This is for 2D spatial data (images). Use dropout3d for volumetric data (3D medical imaging, video with depth)

Parameters

inputTensor<S, D, Dev>: Input tensor. Typical shapes: - (N, C, H, W): Batch of images [batch, channels, height, width] - (C, H, W): Unbatched image [channels, height, width] - (H, W): Single 2D image [height, width] Where: N=batch size, C=channels (RGB=3), H=height, W=width
optionsDropoutFunctionalOptionsoptional

Returns

Tensor<S, D, Dev>– Tensor with same shape as input, with channels dropped and scaled

Examples

// Basic 2D channel dropout for images
const images = torch.randn(32, 3, 224, 224);  // ImageNet batch
const output = torch.nn.functional.dropout2d(images, 0.2, true);
// ~20% of the 3 (or more in deeper layers) channels are dropped entirely

// 2D CNN for image classification
const input = torch.randn(16, 64, 32, 32);  // 16 images, 64 feature channels, 32×32
const conv_out = new torch.nn.Conv2d(64, 128, 3, 1, 1).forward(input);
const regularized = torch.nn.functional.dropout2d(conv_out, 0.3, model.training);
const pooled = torch.nn.functional.max_pool2d(regularized, [2, 2]);
// Channel dropout applied consistently across all spatial positions

// Image segmentation network
const image = torch.randn(8, 3, 512, 512);  // 8 images, RGB, 512×512
const encoder_out = torch.randn(8, 256, 64, 64);  // Encoder feature maps
const dropped = torch.nn.functional.dropout2d(encoder_out, 0.2, true);
const decoder_out = new torch.nn.ConvTranspose2d(256, 128, 4, 2, 1).forward(dropped);
// Spatial structure preserved in segmentation task

// Unbatched image (3D input)
const image = torch.randn(64, 224, 224);  // 64 channels, 224×224
const output = torch.nn.functional.dropout2d(image, 0.1, true);
// output shape: [64, 224, 224] with ~10% of channels masked

// Training vs evaluation behavior
const batch = torch.randn(8, 32, 28, 28);
const train_output = torch.nn.functional.dropout2d(batch, 0.2, true);
// During training: 20% channels dropped, all spatial positions affected

const eval_output = torch.nn.functional.dropout2d(batch, 0.2, false);
// During evaluation: No dropout, returns input unchanged

// Different dropout rates for different depths
const shallow = torch.randn(4, 64, 56, 56);
const deep = torch.randn(4, 512, 7, 7);

const shallow_drop = torch.nn.functional.dropout2d(shallow, 0.1, true);  // 10%
const deep_drop = torch.nn.functional.dropout2d(deep, 0.5, true);       // 50%
// Deeper layers often use higher dropout rates

// Object detection with feature pyramid
const features = [
  torch.randn(8, 256, 56, 56),  // P2
  torch.randn(8, 256, 28, 28),  // P3
  torch.randn(8, 256, 14, 14),  // P4
  torch.randn(8, 256, 7, 7),    // P5
];

const regularized = features.map(f =>
  torch.nn.functional.dropout2d(f, 0.2, true)
);
// Consistent regularization across feature pyramid levels

// ResNet residual block with dropout
const input = torch.randn(32, 128, 32, 32);
const conv1_out = new torch.nn.Conv2d(128, 128, 3, 1, 1).forward(input);
const relu1 = torch.nn.functional.relu(conv1_out);
const drop1 = torch.nn.functional.dropout2d(relu1, 0.1, true);

const conv2_out = new torch.nn.Conv2d(128, 128, 3, 1, 1).forward(drop1);
const output = conv2_out.add(input);  // Residual connection
const final = torch.nn.functional.relu(output);
// Dropout in residual blocks for regularization

torch.nn.functional.dropout2d

Parameters

Returns

Examples

See Also

torch.nn.functional.dropout2d

Parameters

Returns

Examples

See Also