torch.nn.functional.feature_alpha_dropout
function feature_alpha_dropout<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>, options?: AlphaDropoutFunctionalOptions): Tensor<S, D, Dev>function feature_alpha_dropout<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>, p: number, training: boolean, inplace: boolean, options?: AlphaDropoutFunctionalOptions): Tensor<S, D, Dev>Randomly masks out entire channels with SELU-compatible dropout.
Applies channel-wise dropout while maintaining the self-normalizing property of SELU (Scaled Exponential Linear Units). Unlike standard dropout which zeros dropped values, feature_alpha_dropout replaces dropped channels with a learned value to preserve the distribution's mean and variance, critical for Self-Normalizing Neural Networks (SNNs). This function is specifically designed to work with SELU activations and should only be used in networks with SELU or SELU-like activations for optimal results.
Common use cases:
- Self-Normalizing Neural Networks (SNNs): Regularization in networks built with SELU activations
- Deep learning with SELU: When using SELU for self-normalization benefits
- Maintaining distribution properties: Preserving mean/variance through the network
- Hyperparameter stability: SNNs reduce need for batch normalization
- Channel regularization: Preventing co-adaptation of learned features across channels
- Robust feature learning: In networks requiring self-normalization guarantees
The mathematical properties:
- Drops entire channels (all spatial/temporal positions) with probability p
- Replaces dropped channels with α' (a learned constant) instead of zeros
- Applies affine transformation to maintain SELU's self-normalizing properties
- Expected value and variance are preserved across the dropout operation
- Critical for guaranteeing self-normalization in deep SELU networks
- SELU-specific: Only use with SELU activations. With other activations (ReLU, etc.), use standard dropout instead. The mathematical guarantees only hold for SELU networks
- Channel-wise dropout: Entire channels are dropped together, not individual elements. All spatial/temporal positions of a channel share the same mask
- Self-normalization preservation: The affine transform (a, b parameters) ensures mean and variance are preserved, maintaining the self-normalizing property of SELU networks
- Training flag critical: Always set training=true during training and training=false during inference. Leaving it on during inference increases prediction variance
- Deep networks benefit most: SNNs with 8+ layers show greatest advantage from self-normalization. Shallow networks don't need feature_alpha_dropout's properties
- Deterministic during eval: Set training=false or use model.eval() to disable dropout for reproducible inference
- Probability constraint: p must be in [0, 1]. Values outside this range raise errors
- Incompatible with non-SELU activations: Using with ReLU, Tanh, etc. breaks the self-normalization property. These activations should use standard dropout instead. Misuse voids the benefits of self-normalization
- Network architecture dependency: Self-normalization only works with specific weight initialization (lecun_normal). Using other initializations may break guarantees
- No averaging of dropped channels: Unlike standard dropout which scales remaining values, feature_alpha_dropout uses affine transformation. Scaling is automatic, not manual
- Deep networks only: Shallow networks (1-2 layers) don't benefit from self-normalization and may see performance degradation. Use standard dropout for shallow networks instead
- Batch dimension assumptions: Assumes dimension 0 is batch (or unbatched if 2D). Channel dimension is always dimension 1. Other spatial dimensions are arbitrary
Parameters
inputTensor<S, D, Dev>- Input tensor of shape (N, C, ...) or (..., C, ...) where: - N: batch size (optional) - C: number of channels (dimension 1) - ...: spatial or temporal dimensions (H, W for images; D, H, W for volumes; L for sequences) - Minimum 2D tensor required (C and at least one other dimension)
optionsAlphaDropoutFunctionalOptionsoptional
Returns
Tensor<S, D, Dev>– Tensor with same shape as input, with channels dropped and transformedExamples
// Basic feature_alpha_dropout in SELU network
const input = torch.randn(32, 64, 28, 28); // Batch, channels, height, width
const output = torch.nn.functional.feature_alpha_dropout(input, 0.1, true);
// output shape: [32, 64, 28, 28] with ~10% of channels dropped// Self-Normalizing Neural Network layer
const x = torch.randn(16, 128, 10); // Batch, features, sequence_length
const selu_out = torch.nn.functional.selu(linear_layer(x));
const regularized = torch.nn.functional.feature_alpha_dropout(selu_out, 0.05, model.training);
// Maintains self-normalization property through the network// Training vs inference behavior
const features = torch.randn(8, 256, 32, 32);
const is_training = true;
const output_train = torch.nn.functional.feature_alpha_dropout(features, 0.1, is_training);
// During training: ~10% channels dropped, affine transformation applied
const is_training_inf = false;
const output_inference = torch.nn.functional.feature_alpha_dropout(features, 0.1, is_training_inf);
// During inference: No dropout, returns input unchanged// Compare with standard dropout (not SELU-compatible)
const x = torch.randn(16, 64, 14, 14);
const after_selu = torch.nn.functional.selu(x);
// Standard dropout (loses self-normalization)
const standard_dropout = torch.nn.functional.dropout(after_selu, 0.1, true);
// SELU-compatible dropout (preserves self-normalization)
const selu_dropout = torch.nn.functional.feature_alpha_dropout(after_selu, 0.1, true);// Different dropout rates for different regularization strengths
const intermediate = torch.randn(32, 512);
const light_reg = torch.nn.functional.feature_alpha_dropout(intermediate, 0.05, true);
const moderate_reg = torch.nn.functional.feature_alpha_dropout(intermediate, 0.1, true);
const heavy_reg = torch.nn.functional.feature_alpha_dropout(intermediate, 0.2, true);
// Higher p values provide stronger regularization// Typical SNNs architecture pattern
class SNNBlock extends torch.nn.Module {
constructor(in_features: number, out_features: number, p_drop: number = 0.1) {
super();
this.linear = new torch.nn.Linear(in_features, out_features);
this.dropout_p = p_drop;
}
forward(x: Tensor): Tensor {
let y = this.linear(x);
y = torch.nn.functional.selu(y);
y = torch.nn.functional.feature_alpha_dropout(y, this.dropout_p, this.training);
return y;
}
}
// This pattern preserves self-normalization through the networkSee Also
- PyTorch torch.nn.functional.feature_alpha_dropout
- selu - Activation function for self-normalizing networks
- dropout - Standard element-wise dropout (not SELU-compatible)
- dropout1d - Channel dropout for 1D sequences
- dropout2d - Channel dropout for 2D feature maps
- dropout3d - Channel dropout for 3D volumes
- batch_norm - Alternative normalization technique
- layer_norm - Per-feature normalization