torch.nn.AlphaDropout

class AlphaDropout extends Module

new AlphaDropout(options?: DropoutOptions)

readonlyp(number)

AlphaDropout: specialized dropout for self-normalizing neural networks (SNNs).

A variant of dropout designed specifically for use with SELU (Scaled Exponential Linear Unit) activation functions. While standard dropout breaks the self-normalizing property of SNNs, AlphaDropout maintains the mean and variance of activations, preserving the self-normalizing behavior. Essential for:

Self-normalizing neural networks (SNNs) with SELU
Networks using SELU activation where batch norm is not desired
Maintaining zero mean and unit variance through the network
Deep SNNs (very deep networks without explicit normalization)
Improved convergence in self-normalizing architectures

AlphaDropout differs from standard dropout by not just zeroing values but replacing them with a value drawn from the tail of the activation distribution. This maintains the exact mean and variance needed for self-normalization. For networks using SELU, this is more effective than standard dropout combined with batch normalization.

When to use AlphaDropout:

Networks using SELU activation (SNNs)
Deep fully connected networks without batch norm
When self-normalization is crucial
Replacing batch norm in some architectures
Very deep networks (20+ layers) where normalization is critical

SNN vs standard networks:

Standard nets: Use Dropout + BatchNorm (or other normalization)
SNNs: Use AlphaDropout (no batch norm needed due to self-normalization)
Self-normalizing: SELU activation + proper initialization maintains mean/variance
AlphaDropout: Preserves self-normalizing properties during regularization

Trade-offs:

vs Dropout: AlphaDropout preserves mean/variance; Dropout changes them
vs Dropout + BatchNorm: AlphaDropout simpler, no batch statistics needed
vs standard dropout: More complex implementation, specific to SELU networks
Computational cost: Slightly higher than standard dropout
Applicability: Only truly beneficial with SELU (less effective with ReLU)

AlphaDropout mechanics: For input x with SELU activation properties (zero mean, unit variance):

Standard dropout: Apply Bernoulli mask, scale remaining values
AlphaDropout: Replace dropped elements with value from alpha'-alpha distribution
Result: Maintains E[y] = 0 and Var[y] = 1 after dropout

The replaced value is drawn to satisfy:

Mean preservation: E[y] = 0 (same as input)
Variance preservation: Var[y] = 1 (same as input)

\begin{aligned} y_i = \begin{cases} x_i & \text{with probability } (1-p) \\ a_p & \text{with probability } p \end{cases} \\ \text{where } a_p \text{ maintains } \mathbb{E}[y_i] = 0, \text{Var}[y_i] = 1 \end{aligned}

SELU specific: Designed for SELU activation, not recommended for ReLU/others
Mean preservation: Output mean equals input mean (E[y] = E[x] = 0)
Variance preservation: Output variance equals input variance (Var[y] = Var[x] = 1)
Self-normalizing: Critical for self-normalizing property of SNNs
No batch norm: Can replace batch normalization in SELU networks
Proper initialization: SNNs require special weight initialization (see torch.nn.SELU)

SELU only: Much less effective with ReLU or other activations
Initialization critical: Requires LeCun normal initialization for SNNs
Training/inference: Must call .train()/.eval() to control behavior
Not standard dropout: Different mechanics than Dropout class

Examples

// Self-normalizing network with AlphaDropout
class SNNWithAlphaDropout extends torch.nn.Module {
  fc1: torch.nn.Linear;
  alpha_dropout1: torch.nn.AlphaDropout;
  fc2: torch.nn.Linear;
  alpha_dropout2: torch.nn.AlphaDropout;
  fc3: torch.nn.Linear;

  constructor() {
    super();
    this.fc1 = new torch.nn.Linear(784, 256);
    this.alpha_dropout1 = new torch.nn.AlphaDropout(0.1);
    this.fc2 = new torch.nn.Linear(256, 128);
    this.alpha_dropout2 = new torch.nn.AlphaDropout(0.1);
    this.fc3 = new torch.nn.Linear(128, 10);
  }

  forward(x: torch.Tensor): torch.Tensor {
    // SELU activation + AlphaDropout = self-normalizing
    x = torch.selu(this.fc1.forward(x));
    x = this.alpha_dropout1.forward(x);
    x = torch.selu(this.fc2.forward(x));
    x = this.alpha_dropout2.forward(x);
    return this.fc3.forward(x);
  }
}

// AlphaDropout preserves network normalization
const dropout = new torch.nn.AlphaDropout(0.1);
const x = torch.randn([32, 512]);  // Assume mean≈0, variance≈1 from previous SELU

dropout.train();
const out = dropout.forward(x);
// Output still has E ≈ 0, Var ≈ 1 despite dropout

// Comparison: standard dropout breaks self-normalization
const snn_model = new SNNWithAlphaDropout();  // Maintains self-normalization
const standard_model = new SNNWithRegularDropout();  // Breaks self-normalization
// SNNs with AlphaDropout converge faster and more stably

torch.nn.AlphaDropout

Examples

See Also

torch.nn.AlphaDropout

Examples

See Also