torch.nn.functional.feature_alpha_dropout

function feature_alpha_dropout<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>, options?: AlphaDropoutFunctionalOptions): Tensor<S, D, Dev>

function feature_alpha_dropout<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>, p: number, training: boolean, inplace: boolean, options?: AlphaDropoutFunctionalOptions): Tensor<S, D, Dev>

Randomly masks out entire channels with SELU-compatible dropout.

Applies channel-wise dropout while maintaining the self-normalizing property of SELU (Scaled Exponential Linear Units). Unlike standard dropout which zeros dropped values, feature_alpha_dropout replaces dropped channels with a learned value to preserve the distribution's mean and variance, critical for Self-Normalizing Neural Networks (SNNs). This function is specifically designed to work with SELU activations and should only be used in networks with SELU or SELU-like activations for optimal results.

Common use cases:

Self-Normalizing Neural Networks (SNNs): Regularization in networks built with SELU activations
Deep learning with SELU: When using SELU for self-normalization benefits
Maintaining distribution properties: Preserving mean/variance through the network
Hyperparameter stability: SNNs reduce need for batch normalization
Channel regularization: Preventing co-adaptation of learned features across channels
Robust feature learning: In networks requiring self-normalization guarantees

The mathematical properties:

Drops entire channels (all spatial/temporal positions) with probability p
Replaces dropped channels with α' (a learned constant) instead of zeros
Applies affine transformation to maintain SELU's self-normalizing properties
Expected value and variance are preserved across the dropout operation
Critical for guaranteeing self-normalization in deep SELU networks

\begin{aligned} \alpha = 1.6732632423543772848170429916717 \quad \text{(SELU constant)} \\ \text{scale} = 1.0507009873554804934193349852946 \quad \text{(SELU scale)} \\ \alpha' = -\alpha \cdot \text{scale} = -1.7580993865... \quad \text{(dropout replacement value)} \\ a = \sqrt{\frac{1}{(1-p)(1 + p\alpha'^2)}} \quad \text{(scaling factor for self-normalization)} \\ b = -a \cdot \alpha' \cdot p \quad \text{(shift for mean preservation)} \\ \text{output} = a \cdot (\text{input} \odot \text{mask} + (1-\text{mask}) \cdot \alpha') + b \end{aligned}

SELU-specific: Only use with SELU activations. With other activations (ReLU, etc.), use standard dropout instead. The mathematical guarantees only hold for SELU networks
Channel-wise dropout: Entire channels are dropped together, not individual elements. All spatial/temporal positions of a channel share the same mask
Self-normalization preservation: The affine transform (a, b parameters) ensures mean and variance are preserved, maintaining the self-normalizing property of SELU networks
Training flag critical: Always set training=true during training and training=false during inference. Leaving it on during inference increases prediction variance
Deep networks benefit most: SNNs with 8+ layers show greatest advantage from self-normalization. Shallow networks don't need feature_alpha_dropout's properties
Deterministic during eval: Set training=false or use model.eval() to disable dropout for reproducible inference
Probability constraint: p must be in [0, 1]. Values outside this range raise errors

Incompatible with non-SELU activations: Using with ReLU, Tanh, etc. breaks the self-normalization property. These activations should use standard dropout instead. Misuse voids the benefits of self-normalization
Network architecture dependency: Self-normalization only works with specific weight initialization (lecun_normal). Using other initializations may break guarantees
No averaging of dropped channels: Unlike standard dropout which scales remaining values, feature_alpha_dropout uses affine transformation. Scaling is automatic, not manual
Deep networks only: Shallow networks (1-2 layers) don't benefit from self-normalization and may see performance degradation. Use standard dropout for shallow networks instead
Batch dimension assumptions: Assumes dimension 0 is batch (or unbatched if 2D). Channel dimension is always dimension 1. Other spatial dimensions are arbitrary

Parameters

inputTensor<S, D, Dev>: Input tensor of shape (N, C, ...) or (..., C, ...) where: - N: batch size (optional) - C: number of channels (dimension 1) - ...: spatial or temporal dimensions (H, W for images; D, H, W for volumes; L for sequences) - Minimum 2D tensor required (C and at least one other dimension)
optionsAlphaDropoutFunctionalOptionsoptional

Returns

Tensor<S, D, Dev>– Tensor with same shape as input, with channels dropped and transformed

Examples

// Basic feature_alpha_dropout in SELU network
const input = torch.randn(32, 64, 28, 28);  // Batch, channels, height, width
const output = torch.nn.functional.feature_alpha_dropout(input, 0.1, true);
// output shape: [32, 64, 28, 28] with ~10% of channels dropped

// Self-Normalizing Neural Network layer
const x = torch.randn(16, 128, 10);  // Batch, features, sequence_length
const selu_out = torch.nn.functional.selu(linear_layer(x));
const regularized = torch.nn.functional.feature_alpha_dropout(selu_out, 0.05, model.training);
// Maintains self-normalization property through the network

// Training vs inference behavior
const features = torch.randn(8, 256, 32, 32);
const is_training = true;
const output_train = torch.nn.functional.feature_alpha_dropout(features, 0.1, is_training);
// During training: ~10% channels dropped, affine transformation applied

const is_training_inf = false;
const output_inference = torch.nn.functional.feature_alpha_dropout(features, 0.1, is_training_inf);
// During inference: No dropout, returns input unchanged

// Compare with standard dropout (not SELU-compatible)
const x = torch.randn(16, 64, 14, 14);
const after_selu = torch.nn.functional.selu(x);

// Standard dropout (loses self-normalization)
const standard_dropout = torch.nn.functional.dropout(after_selu, 0.1, true);

// SELU-compatible dropout (preserves self-normalization)
const selu_dropout = torch.nn.functional.feature_alpha_dropout(after_selu, 0.1, true);

// Different dropout rates for different regularization strengths
const intermediate = torch.randn(32, 512);
const light_reg = torch.nn.functional.feature_alpha_dropout(intermediate, 0.05, true);
const moderate_reg = torch.nn.functional.feature_alpha_dropout(intermediate, 0.1, true);
const heavy_reg = torch.nn.functional.feature_alpha_dropout(intermediate, 0.2, true);
// Higher p values provide stronger regularization

// Typical SNNs architecture pattern
class SNNBlock extends torch.nn.Module {
  constructor(in_features: number, out_features: number, p_drop: number = 0.1) {
    super();
    this.linear = new torch.nn.Linear(in_features, out_features);
    this.dropout_p = p_drop;
  }

  forward(x: Tensor): Tensor {
    let y = this.linear(x);
    y = torch.nn.functional.selu(y);
    y = torch.nn.functional.feature_alpha_dropout(y, this.dropout_p, this.training);
    return y;
  }
}
// This pattern preserves self-normalization through the network

torch.nn.functional.feature_alpha_dropout

Parameters

Returns

Examples

See Also

torch.nn.functional.feature_alpha_dropout

Parameters

Returns

Examples

See Also