torch.nn.functional.dropout2d
function dropout2d<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>, options?: DropoutFunctionalOptions): Tensor<S, D, Dev>function dropout2d<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>, p: number, training: boolean, inplace: boolean, options?: DropoutFunctionalOptions): Tensor<S, D, Dev>Randomly zeros out entire channels in 2D input (spatial feature maps).
Applies channel-wise dropout for 2D spatial data (images). Instead of dropping individual pixels, dropout2d drops entire feature maps consistently across all spatial positions. This preserves spatial coherence and is the standard regularization for 2D CNNs. Each channel (feature map) is either completely dropped or completely kept, with the same mask applied to all height and width positions. This is more effective than element-wise dropout for images where local spatial structure matters.
Common use cases:
- 2D CNNs for images: Drop entire filters consistently across spatial dimensions
- Convolutional layer regularization: Standard dropout for image classification networks
- Feature map adaptation prevention: Prevents co-adaptation of spatially-local features
- Image segmentation: Regularize intermediate feature maps preserving spatial relationships
- Object detection: Feature dropout in detection backbones maintaining localization
- Medical imaging: Regularize 2D slices in medical image analysis
- Style transfer networks: Dropout on content and style feature maps
The key advantage over element-wise dropout:
- Spatial coherence: Neighboring pixels are treated consistently
- Stronger regularization: Entire learned features dropped, not just individual values
- Biologically plausible: Mirrors how neurons in cortex are organized into columns
- Better for images: Exploits 2D structure of image data
Dimensionality handling:
- 1D input (L,): Element-wise dropout applied
- 2D input (H, W): Channel dropout treating first dim as channels
- 3D input (C, H, W): Channel dropout per channel
- 4D input (N, C, H, W): Batch-wise channel dropout (different mask per batch)
- Higher dims: Falls back to element-wise dropout
- Spatial consistency: The same channel is either dropped or kept across all H×W spatial positions. All pixels of a feature map share the same fate
- Channel as feature: Each channel represents a learned feature/filter. Dropping entire channels prevents specific features from being used, forcing robust learning
- Scaling factor: Kept channels are automatically scaled by 1/(1-p) to maintain expected value. This is inverted dropout - no rescaling needed at inference
- Independent per batch: With 4D input (N, C, H, W), each image gets an independent random channel mask, improving regularization effectiveness
- Standard for CNNs: dropout2d is the de facto standard for image CNNs. Use this over element-wise dropout for 2D spatial data
- Deterministic eval: Always set training=false during inference for reproducible results
- Color channel handling: For RGB images, channels 0-2 are treated like any others. Typically dropout is applied after convolution, not directly to RGB
- Don't mix with batch_norm carelessly: Using both together can cause training instability. Modern practice often uses either batch_norm or dropout, not both heavily
- Scaling is automatic: The 1/(1-p) scaling is applied automatically during training. Don't manually rescale - that would cause incorrect statistics
- High dropout rates: Rates above 0.5 are uncommon and usually indicate too weak of a model. Consider regularizing through model size or L2 regularization instead
- Channel dimension position: This function assumes channels are at dimension 1. If your data has channels elsewhere, use permute/transpose first
- 3D CNNs need dropout3d: This is for 2D spatial data (images). Use dropout3d for volumetric data (3D medical imaging, video with depth)
Parameters
inputTensor<S, D, Dev>- Input tensor. Typical shapes: - (N, C, H, W): Batch of images [batch, channels, height, width] - (C, H, W): Unbatched image [channels, height, width] - (H, W): Single 2D image [height, width] Where: N=batch size, C=channels (RGB=3), H=height, W=width
optionsDropoutFunctionalOptionsoptional
Returns
Tensor<S, D, Dev>– Tensor with same shape as input, with channels dropped and scaledExamples
// Basic 2D channel dropout for images
const images = torch.randn(32, 3, 224, 224); // ImageNet batch
const output = torch.nn.functional.dropout2d(images, 0.2, true);
// ~20% of the 3 (or more in deeper layers) channels are dropped entirely// 2D CNN for image classification
const input = torch.randn(16, 64, 32, 32); // 16 images, 64 feature channels, 32×32
const conv_out = new torch.nn.Conv2d(64, 128, 3, 1, 1).forward(input);
const regularized = torch.nn.functional.dropout2d(conv_out, 0.3, model.training);
const pooled = torch.nn.functional.max_pool2d(regularized, [2, 2]);
// Channel dropout applied consistently across all spatial positions// Image segmentation network
const image = torch.randn(8, 3, 512, 512); // 8 images, RGB, 512×512
const encoder_out = torch.randn(8, 256, 64, 64); // Encoder feature maps
const dropped = torch.nn.functional.dropout2d(encoder_out, 0.2, true);
const decoder_out = new torch.nn.ConvTranspose2d(256, 128, 4, 2, 1).forward(dropped);
// Spatial structure preserved in segmentation task// Unbatched image (3D input)
const image = torch.randn(64, 224, 224); // 64 channels, 224×224
const output = torch.nn.functional.dropout2d(image, 0.1, true);
// output shape: [64, 224, 224] with ~10% of channels masked// Training vs evaluation behavior
const batch = torch.randn(8, 32, 28, 28);
const train_output = torch.nn.functional.dropout2d(batch, 0.2, true);
// During training: 20% channels dropped, all spatial positions affected
const eval_output = torch.nn.functional.dropout2d(batch, 0.2, false);
// During evaluation: No dropout, returns input unchanged// Different dropout rates for different depths
const shallow = torch.randn(4, 64, 56, 56);
const deep = torch.randn(4, 512, 7, 7);
const shallow_drop = torch.nn.functional.dropout2d(shallow, 0.1, true); // 10%
const deep_drop = torch.nn.functional.dropout2d(deep, 0.5, true); // 50%
// Deeper layers often use higher dropout rates// Object detection with feature pyramid
const features = [
torch.randn(8, 256, 56, 56), // P2
torch.randn(8, 256, 28, 28), // P3
torch.randn(8, 256, 14, 14), // P4
torch.randn(8, 256, 7, 7), // P5
];
const regularized = features.map(f =>
torch.nn.functional.dropout2d(f, 0.2, true)
);
// Consistent regularization across feature pyramid levels// ResNet residual block with dropout
const input = torch.randn(32, 128, 32, 32);
const conv1_out = new torch.nn.Conv2d(128, 128, 3, 1, 1).forward(input);
const relu1 = torch.nn.functional.relu(conv1_out);
const drop1 = torch.nn.functional.dropout2d(relu1, 0.1, true);
const conv2_out = new torch.nn.Conv2d(128, 128, 3, 1, 1).forward(drop1);
const output = conv2_out.add(input); // Residual connection
const final = torch.nn.functional.relu(output);
// Dropout in residual blocks for regularizationSee Also
- PyTorch torch.nn.functional.dropout2d
- dropout1d - Channel dropout for 1D sequences
- dropout3d - Channel dropout for 3D volumetric data
- dropout - Element-wise dropout (not channel-wise)
- feature_alpha_dropout - SELU-compatible channel dropout
- batch_norm - Alternative regularization using normalization
- layer_norm - Per-channel normalization
- max_pool2d - Spatial pooling after convolution
- Conv2d - 2D convolution layer