torch.nn.functional.dropout
function dropout<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>, options?: DropoutFunctionalOptions): Tensor<S, D, Dev>function dropout<S extends Shape, D extends DType = DType, Dev extends DeviceType = DeviceType>(input: Tensor<S, D, Dev>, p: number, training: boolean, inplace: boolean, options?: DropoutFunctionalOptions): Tensor<S, D, Dev>Dropout regularization: randomly zeroes elements during training to prevent overfitting.
Randomly sets elements to zero with probability p and scales remaining elements by 1/(1-p). This simple but effective regularization technique is essential for:
- Preventing overfitting in deep networks (reduces co-adaptation of neurons)
- Ensemble-like effect: different networks sampled at each training step
- Feature adaptation: neurons learn robust features when neighbors unavailable
- Reducing model complexity (implicit ensemble of sub-networks)
- Standard regularization technique in all modern neural networks
When to use Dropout:
- Training deep networks (standard practice: include in most layers)
- When you observe overfitting (training loss much lower than validation loss)
- To reduce model complexity without explicit regularization
- Must be disabled at test time (use model.eval() or training=false)
- Typical probabilities: p=0.5 (hidden), p=0.1-0.3 (input)
How it works:
- Training: Randomly zero elements with probability p, scale by 1/(1-p) to maintain expected value
- Inference: Keep all elements unchanged (no dropout applied)
- Effect: Each forward pass trains different random sub-network
- Ensemble: Equivalent to ensemble of many sub-networks
Trade-offs vs other regularization:
- vs L1/L2 weight decay: Dropout more aggressive, penalizes specific activations
- vs Batch Norm: Batch norm also regularizes but via normalization; dropout different mechanism
- vs Early Stopping: Dropout trains longer without overfitting; early stopping stops early
- Complementary: Works well with batch norm and weight decay
- Must disable in eval: Critical to set training=false at test time!
- Expected value: Scaling by 1/(1-p) maintains expected value (important for correctness)
- Ensemble effect: Each forward pass trains different random sub-network
- Hyperparameter: Typical p values: 0.5 (hidden layers), 0.1-0.3 (input)
- Complementary: Works well with batch norm and weight decay
- No overhead at test: Negligible cost when training=false (just identity operation)
- Critical in eval: Forgetting to disable dropout in evaluation is a common bug!
- Training vs inference: Results will differ significantly if dropout not properly toggled
- Hyperparameter: p needs tuning; wrong value can hurt performance
Parameters
inputTensor<S, D, Dev>- Input tensor of any shape
optionsDropoutFunctionalOptionsoptional
Returns
Tensor<S, D, Dev>– Tensor with dropout applied during training, same shape as inputExamples
// Typical neural network with dropout
class DropoutNet extends torch.nn.Module {
private fc1: torch.nn.Linear;
private fc2: torch.nn.Linear;
private fc3: torch.nn.Linear;
forward(x: torch.Tensor): torch.Tensor {
// Hidden layers with 0.5 dropout
x = torch.nn.functional.relu(this.fc1.forward(x));
x = torch.nn.functional.dropout(x, 0.5, this.training);
x = torch.nn.functional.relu(this.fc2.forward(x));
x = torch.nn.functional.dropout(x, 0.5, this.training);
// Output layer (no dropout)
return this.fc3.forward(x);
}
}
const model = new DropoutNet();
// Training
model.train(); // Sets training=true
const output_train = model.forward(x); // Dropout active
// Inference
model.eval(); // Sets training=false
const output_test = model.forward(x); // No dropout// Custom dropout in training loop
for (let epoch = 0; epoch < epochs; epoch++) {
// Training
model.train();
let x = hidden_layer_output;
x = torch.nn.functional.dropout(x, 0.5, true); // Training: apply dropout
x = next_layer.forward(x);
// Validation
model.eval();
x = hidden_layer_output;
x = torch.nn.functional.dropout(x, 0.5, false); // Eval: no dropout
x = next_layer.forward(x);
}// Different probabilities for input vs hidden
class DropoutVariant extends torch.nn.Module {
forward(x: torch.Tensor): torch.Tensor {
// Input dropout: milder (0.2)
x = torch.nn.functional.dropout(x, 0.2, this.training);
x = this.fc1.forward(x);
x = torch.nn.functional.relu(x);
// Hidden dropout: stronger (0.5)
x = torch.nn.functional.dropout(x, 0.5, this.training);
x = this.fc2.forward(x);
return x;
}
}See Also
- PyTorch torch.nn.functional.dropout
- alpha_dropout - Variant for SELU activations (maintains self-normalizing property)
- batch_norm - Complementary regularization via normalization
- weight_decay - L2 regularization on weights