torch.nn.AlphaDropout
class AlphaDropout extends Modulenew AlphaDropout(options?: DropoutOptions)
- readonly
p(number)
AlphaDropout: specialized dropout for self-normalizing neural networks (SNNs).
A variant of dropout designed specifically for use with SELU (Scaled Exponential Linear Unit) activation functions. While standard dropout breaks the self-normalizing property of SNNs, AlphaDropout maintains the mean and variance of activations, preserving the self-normalizing behavior. Essential for:
- Self-normalizing neural networks (SNNs) with SELU
- Networks using SELU activation where batch norm is not desired
- Maintaining zero mean and unit variance through the network
- Deep SNNs (very deep networks without explicit normalization)
- Improved convergence in self-normalizing architectures
AlphaDropout differs from standard dropout by not just zeroing values but replacing them with a value drawn from the tail of the activation distribution. This maintains the exact mean and variance needed for self-normalization. For networks using SELU, this is more effective than standard dropout combined with batch normalization.
When to use AlphaDropout:
- Networks using SELU activation (SNNs)
- Deep fully connected networks without batch norm
- When self-normalization is crucial
- Replacing batch norm in some architectures
- Very deep networks (20+ layers) where normalization is critical
SNN vs standard networks:
- Standard nets: Use Dropout + BatchNorm (or other normalization)
- SNNs: Use AlphaDropout (no batch norm needed due to self-normalization)
- Self-normalizing: SELU activation + proper initialization maintains mean/variance
- AlphaDropout: Preserves self-normalizing properties during regularization
Trade-offs:
- vs Dropout: AlphaDropout preserves mean/variance; Dropout changes them
- vs Dropout + BatchNorm: AlphaDropout simpler, no batch statistics needed
- vs standard dropout: More complex implementation, specific to SELU networks
- Computational cost: Slightly higher than standard dropout
- Applicability: Only truly beneficial with SELU (less effective with ReLU)
AlphaDropout mechanics: For input x with SELU activation properties (zero mean, unit variance):
- Standard dropout: Apply Bernoulli mask, scale remaining values
- AlphaDropout: Replace dropped elements with value from alpha'-alpha distribution
- Result: Maintains E[y] = 0 and Var[y] = 1 after dropout
The replaced value is drawn to satisfy:
- Mean preservation: E[y] = 0 (same as input)
- Variance preservation: Var[y] = 1 (same as input)
- SELU specific: Designed for SELU activation, not recommended for ReLU/others
- Mean preservation: Output mean equals input mean (E[y] = E[x] = 0)
- Variance preservation: Output variance equals input variance (Var[y] = Var[x] = 1)
- Self-normalizing: Critical for self-normalizing property of SNNs
- No batch norm: Can replace batch normalization in SELU networks
- Proper initialization: SNNs require special weight initialization (see torch.nn.SELU)
- SELU only: Much less effective with ReLU or other activations
- Initialization critical: Requires LeCun normal initialization for SNNs
- Training/inference: Must call .train()/.eval() to control behavior
- Not standard dropout: Different mechanics than Dropout class
Examples
// Self-normalizing network with AlphaDropout
class SNNWithAlphaDropout extends torch.nn.Module {
fc1: torch.nn.Linear;
alpha_dropout1: torch.nn.AlphaDropout;
fc2: torch.nn.Linear;
alpha_dropout2: torch.nn.AlphaDropout;
fc3: torch.nn.Linear;
constructor() {
super();
this.fc1 = new torch.nn.Linear(784, 256);
this.alpha_dropout1 = new torch.nn.AlphaDropout(0.1);
this.fc2 = new torch.nn.Linear(256, 128);
this.alpha_dropout2 = new torch.nn.AlphaDropout(0.1);
this.fc3 = new torch.nn.Linear(128, 10);
}
forward(x: torch.Tensor): torch.Tensor {
// SELU activation + AlphaDropout = self-normalizing
x = torch.selu(this.fc1.forward(x));
x = this.alpha_dropout1.forward(x);
x = torch.selu(this.fc2.forward(x));
x = this.alpha_dropout2.forward(x);
return this.fc3.forward(x);
}
}// AlphaDropout preserves network normalization
const dropout = new torch.nn.AlphaDropout(0.1);
const x = torch.randn([32, 512]); // Assume mean≈0, variance≈1 from previous SELU
dropout.train();
const out = dropout.forward(x);
// Output still has E ≈ 0, Var ≈ 1 despite dropout// Comparison: standard dropout breaks self-normalization
const snn_model = new SNNWithAlphaDropout(); // Maintains self-normalization
const standard_model = new SNNWithRegularDropout(); // Breaks self-normalization
// SNNs with AlphaDropout converge faster and more stably