torch.nn.functional.alpha_dropout
function alpha_dropout<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>, options?: AlphaDropoutFunctionalOptions): Tensor<S, D, Dev>function alpha_dropout<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>, p: number, training: boolean, inplace: boolean, options?: AlphaDropoutFunctionalOptions): Tensor<S, D, Dev>Alpha Dropout: dropout for self-normalizing neural networks (SNNs) with SELU.
Applies dropout while preserving self-normalizing properties of SELU-activated networks. Unlike standard dropout which breaks self-normalization, alpha dropout maintains the mean and variance structure crucial for SELU networks. During training, randomly zeroes activations with probability p, but scales remaining values to maintain normalized statistics. Essential for:
- Self-Normalizing Neural Networks (SNNs) with SELU activations
- Deep networks without batch normalization (SNNs provide normalization)
- Regularization that preserves SELU's self-normalization properties
- Training very deep networks (SELU + alpha dropout = stable training)
- Networks requiring strict variance/mean preservation during regularization
- Alternative to batch norm + standard dropout (more principled approach)
Self-Normalization Property: SELU networks automatically maintain normalized activations (mean≈0, std≈1) throughout layers. Standard dropout breaks this by randomly zeroing values. Alpha dropout preserves the property by scaling non-dropped activations to compensate, maintaining statistical properties.
When to use Alpha Dropout:
- SELU networks (not applicable to ReLU/other activations)
- Deep networks without batch normalization
- When self-normalization is critical for training stability
- Principled regularization for SNNs (better than standard dropout for SELU)
- Data with limited samples (regularization without batch stats)
Comparison with alternatives:
- Standard dropout: Breaks SELU's self-normalization; not suitable for SNNs
- Batch Norm: Explicit normalization; alpha dropout is implicit via scaling
- Layer Norm: Normalizes per sample; alpha dropout normalizes per network
- No regularization: SNNs don't need explicit normalization, but need regularization
- SELU-specific: Only suitable for SELU-activated networks; other activations need standard dropout
- Self-normalization preserved: Maintains mean ≈ 0, std ≈ 1 throughout network
- Scaling adjustment: Remaining activations are scaled to compensate for dropped ones
- Deep network friendly: Enables very deep networks without batch normalization
- Training mode dependent: Dropout only applied when training=true
- Principled approach: Mathematically preserves network properties better than standard dropout
- No learnable parameters: Pure stochastic regularization, no trained weights
- SELU requirement: Must be used with SELU activations; breaks with ReLU, sigmoid, tanh
- Not for other activations: Standard dropout should be used with non-SELU networks
- Must set training flag: Behavior differs between training=true and training=false
- Input assumption: Assumes input is mean ≈ 0, std ≈ 1 (from SELU layer)
Parameters
inputTensor<S, D, Dev>- Input tensor of any shape
optionsAlphaDropoutFunctionalOptionsoptional- Options for the operation. See
AlphaDropoutFunctionalOptions.
Returns
Tensor<S, D, Dev>– Tensor with alpha dropout applied element-wiseExamples
// Self-Normalizing Neural Network with Alpha Dropout
const x = torch.randn(batch_size, in_features);
const h1 = torch.nn.functional.selu(linear1(x)); // Self-normalized activation
const h1_dropped = torch.nn.functional.alpha_dropout(h1, 0.1, true); // Keep properties
const h2 = torch.nn.functional.selu(linear2(h1_dropped)); // Still self-normalized
const output = linear3(h2);
// Alpha dropout maintains SELU's self-normalization through network
// Deep SNN: 10 layers with alpha dropout for regularization
let x = torch.randn(batch, input_dim);
for (let i = 0; i < 10; i++) {
x = torch.nn.functional.selu(layers[i](x)); // SELU ensures normalization
x = torch.nn.functional.alpha_dropout(x, 0.1, training); // Preserve properties
}
const output = final_layer(x);
// Deep network stays stable without batch norm due to SELU + alpha dropout
// Comparison: alpha dropout vs standard dropout
const selu_hidden = torch.nn.functional.selu(linear(input)); // Self-normalized
const alpha_dropped = torch.nn.functional.alpha_dropout(selu_hidden, 0.1, true);
const std_dropped = torch.nn.functional.dropout(selu_hidden, 0.1, true);
// alpha_dropped preserves mean/variance; std_dropped breaks self-normalization
// Regularization strength control
const weak_reg = torch.nn.functional.alpha_dropout(h, 0.05, training); // 5% dropout
const strong_reg = torch.nn.functional.alpha_dropout(h, 0.3, training); // 30% dropout
// Higher p = stronger regularization (but higher information loss)
// Inference: no dropout applied
model.eval(); // Set to inference mode
const test_x = test_input;
const test_h = torch.nn.functional.selu(test_linear(test_x));
const test_out = torch.nn.functional.alpha_dropout(test_h, 0.1, false); // No dropout
// test_out == test_h (no modification during inference)See Also
- PyTorch torch.nn.functional.alpha_dropout
- torch.nn.functional.selu - Self-normalizing activation (requires alpha dropout)
- torch.nn.functional.dropout - Standard dropout (incompatible with SELU)
- torch.nn.Dropout - Module wrapper for standard dropout
- torch.nn.AlphaDropout - Module wrapper for alpha dropout