torch.optim.lr_scheduler.OneCycleLR
class OneCycleLRnew OneCycleLR(optimizer: Optimizer, options: {
/** Maximum learning rate (or array for each param group) */
max_lr: number | number[];
/** Total number of training steps */
total_steps?: number;
/** Total number of epochs (alternative to total_steps) */
epochs?: number;
/** Number of steps per epoch (required if using epochs) */
steps_per_epoch?: number;
/** Percentage of cycle spent increasing LR (default: 0.3) */
pct_start?: number;
/** Anneal strategy: 'cos' or 'linear' (default: 'cos') */
anneal_strategy?: AnnealStrategy;
/** Whether to cycle momentum inversely to LR (default: true) */
cycle_momentum?: boolean;
/** Initial learning rate = max_lr/div_factor (default: 25) */
div_factor?: number;
/** Min LR = initial_lr/final_div_factor (default: 1e4) */
final_div_factor?: number;
/** Run 3-phase schedule (default: false) */
three_phase?: boolean;
/** The index of last epoch (default: -1) */
last_epoch?: number;
/** Whether to print a message for each update (default: false) */
verbose?: boolean;
})
Constructor Parameters
optimizerOptimizer- Wrapped optimizer
options{ /** Maximum learning rate (or array for each param group) */ max_lr: number | number[]; /** Total number of training steps */ total_steps?: number; /** Total number of epochs (alternative to total_steps) */ epochs?: number; /** Number of steps per epoch (required if using epochs) */ steps_per_epoch?: number; /** Percentage of cycle spent increasing LR (default: 0.3) */ pct_start?: number; /** Anneal strategy: 'cos' or 'linear' (default: 'cos') */ anneal_strategy?: AnnealStrategy; /** Whether to cycle momentum inversely to LR (default: true) */ cycle_momentum?: boolean; /** Initial learning rate = max_lr/div_factor (default: 25) */ div_factor?: number; /** Min LR = initial_lr/final_div_factor (default: 1e4) */ final_div_factor?: number; /** Run 3-phase schedule (default: false) */ three_phase?: boolean; /** The index of last epoch (default: -1) */ last_epoch?: number; /** Whether to print a message for each update (default: false) */ verbose?: boolean; }- Scheduler options
optimizer(Optimizer)- – The optimizer being scheduled
max_lrs(number[])- – Maximum learning rates
total_steps(number)- – Total number of training steps
pct_start(number)- – Percentage of cycle spent increasing LR
anneal_strategy(AnnealStrategy)- – Anneal strategy
div_factor(number)- – Determines initial LR as max_lr/div_factor
final_div_factor(number)- – Determines minimum LR as initial_lr/final_div_factor
three_phase(boolean)- – Run 3-phase schedule if true
last_epoch(number)- – Last epoch
OneCycleLR scheduler: 1cycle learning rate policy for superconvergence.
OneCycleLR implements the "1cycle" learning rate policy (Smith & Topin, 2019). Instead of monotonically decaying learning rate, it cycles the lr from a low value up to a maximum, then back down to a very low value. This single cycle over the full training duration often produces better accuracy and faster convergence than traditional schedules.
Key insight:
- Start with low lr, ramp up to max_lr (first ~30% of training)
- Then ramp down to very low lr (remaining ~70% of training)
- This forcing through different lr regimes helps escape local minima
- Often achieves better final accuracy with shorter training time
When to use OneCycleLR:
- When you know total training steps/epochs in advance
- Want faster convergence with potential accuracy gains
- Fixed training duration (like fastai implementations)
- Research or competitive scenarios (Kaggle, etc)
- Models that benefit from "aggressive" learning rate schedules
Trade-offs:
- Requires knowing total_steps in advance (like CosineAnnealingLR)
- More complex than simple fixed-schedule approaches
- Momentum scheduling may affect reproducibility
- Requires tuning of max_lr, pct_start, anneal_strategy
- Step-based (per-batch) not epoch-based, different semantics
Algorithm: Two-phase learning rate cycling:
- Ascent phase (first pct_start): Linear or cosine increase from base_lr to max_lr
- Descent phase (remaining): Linear or cosine decrease from max_lr to min_lr
Momentum also cycles (inverse of lr) for better optimization dynamics.
- Step-based: Calls step() per batch, not per epoch (unlike CosineAnnealingLR).
- Total steps critical: Must specify exact total_steps (epochs × batches_per_epoch).
- Faster convergence: Often achieves better results with shorter training time.
- Empirically strong: Proposed in "A disciplined approach to neural network training" (Smith & Topin).
- Momentum cycling: Default cycles momentum (inverse of lr) for better dynamics.
- pct_start controls shape: Higher pct_start = longer ascent, longer descent phase.
- Anneal strategy: 'cos' usually slightly better than 'linear', but difference small.
- div_factor controls start: initial_lr = max_lr / div_factor, adjust for warmup speed.
- final_div_factor controls end: min_lr = max_lr / final_div_factor, typically 100.
- Not for early stopping: Designed for fixed-duration training (can't easily early stop).
Examples
// Basic OneCycleLR for 1000 steps, max lr 0.1
const scheduler = new torch.optim.OneCycleLR(optimizer, {
max_lr: 0.1,
total_steps: 1000 // e.g., 10 epochs × 100 batches
});
for (const batch of dataloader) {
// train step
scheduler.step();
}// Cosine annealing with custom pct_start
const scheduler = new torch.optim.OneCycleLR(optimizer, {
max_lr: 0.1,
total_steps: 1000,
pct_start: 0.4, // Spend 40% ascending, 60% descending
anneal_strategy: 'cos' // Cosine annealing
});// Linear annealing with longer ascent phase
const scheduler = new torch.optim.OneCycleLR(optimizer, {
max_lr: 0.1,
total_steps: 5000,
pct_start: 0.5, // Spend half the time ascending
anneal_strategy: 'linear', // Linear annealing
div_factor: 10 // Start at 0.01 (0.1/10)
});// Without momentum cycling (just lr cycling)
const scheduler = new torch.optim.OneCycleLR(optimizer, {
max_lr: 0.1,
total_steps: 1000,
cycle_momentum: false // Only cycle lr, not momentum
});