torch.optim.lr_scheduler.OneCycleLR

class OneCycleLR

new OneCycleLR(optimizer: Optimizer, options: {
/** Maximum learning rate (or array for each param group) */
max_lr: number | number[];
/** Total number of training steps */
total_steps?: number;
/** Total number of epochs (alternative to total_steps) */
epochs?: number;
/** Number of steps per epoch (required if using epochs) */
steps_per_epoch?: number;
/** Percentage of cycle spent increasing LR (default: 0.3) */
pct_start?: number;
/** Anneal strategy: 'cos' or 'linear' (default: 'cos') */
anneal_strategy?: AnnealStrategy;
/** Whether to cycle momentum inversely to LR (default: true) */
cycle_momentum?: boolean;
/** Initial learning rate = max_lr/div_factor (default: 25) */
div_factor?: number;
/** Min LR = initial_lr/final_div_factor (default: 1e4) */
final_div_factor?: number;
/** Run 3-phase schedule (default: false) */
three_phase?: boolean;
/** The index of last epoch (default: -1) */
last_epoch?: number;
/** Whether to print a message for each update (default: false) */
verbose?: boolean;
})

Constructor Parameters

optimizerOptimizer: Wrapped optimizer
options{ /** Maximum learning rate (or array for each param group) */ max_lr: number | number[]; /** Total number of training steps */ total_steps?: number; /** Total number of epochs (alternative to total_steps) */ epochs?: number; /** Number of steps per epoch (required if using epochs) */ steps_per_epoch?: number; /** Percentage of cycle spent increasing LR (default: 0.3) */ pct_start?: number; /** Anneal strategy: 'cos' or 'linear' (default: 'cos') */ anneal_strategy?: AnnealStrategy; /** Whether to cycle momentum inversely to LR (default: true) */ cycle_momentum?: boolean; /** Initial learning rate = max_lr/div_factor (default: 25) */ div_factor?: number; /** Min LR = initial_lr/final_div_factor (default: 1e4) */ final_div_factor?: number; /** Run 3-phase schedule (default: false) */ three_phase?: boolean; /** The index of last epoch (default: -1) */ last_epoch?: number; /** Whether to print a message for each update (default: false) */ verbose?: boolean; }: Scheduler options

optimizer(Optimizer): – The optimizer being scheduled
max_lrs(number[]): – Maximum learning rates
total_steps(number): – Total number of training steps
pct_start(number): – Percentage of cycle spent increasing LR
anneal_strategy(AnnealStrategy): – Anneal strategy
div_factor(number): – Determines initial LR as max_lr/div_factor
final_div_factor(number): – Determines minimum LR as initial_lr/final_div_factor
three_phase(boolean): – Run 3-phase schedule if true
last_epoch(number): – Last epoch

OneCycleLR scheduler: 1cycle learning rate policy for superconvergence.

OneCycleLR implements the "1cycle" learning rate policy (Smith & Topin, 2019). Instead of monotonically decaying learning rate, it cycles the lr from a low value up to a maximum, then back down to a very low value. This single cycle over the full training duration often produces better accuracy and faster convergence than traditional schedules.

Key insight:

Start with low lr, ramp up to max_lr (first ~30% of training)
Then ramp down to very low lr (remaining ~70% of training)
This forcing through different lr regimes helps escape local minima
Often achieves better final accuracy with shorter training time

When to use OneCycleLR:

When you know total training steps/epochs in advance
Want faster convergence with potential accuracy gains
Fixed training duration (like fastai implementations)
Research or competitive scenarios (Kaggle, etc)
Models that benefit from "aggressive" learning rate schedules

Trade-offs:

Requires knowing total_steps in advance (like CosineAnnealingLR)
More complex than simple fixed-schedule approaches
Momentum scheduling may affect reproducibility
Requires tuning of max_lr, pct_start, anneal_strategy
Step-based (per-batch) not epoch-based, different semantics

Algorithm: Two-phase learning rate cycling:

Ascent phase (first pct_start): Linear or cosine increase from base_lr to max_lr
Descent phase (remaining): Linear or cosine decrease from max_lr to min_lr

Momentum also cycles (inverse of lr) for better optimization dynamics.

\begin{aligned} p = \frac{\text{step}}{\text{total\_steps}} \\ \text{Ascent:} \; \eta = \eta_{\min} + (\eta_{\max} - \eta_{\min}) \cdot \begin{cases} \frac{1 - \cos(\pi p / p_{\text{start}})}{2} & \text{cosine} \\ \frac{p}{p_{\text{start}}} & \text{linear} \end{cases} \\ \text{Descent:} \; \eta = \eta_{\max} + (\eta_{\min} - \eta_{\max}) \cdot \begin{cases} \frac{1 - \cos(\pi + \pi(p - p_{\text{start}})/(1 - p_{\text{start}}))}{2} & \text{cosine} \\ \frac{p - p_{\text{start}}}{1 - p_{\text{start}}} & \text{linear} \end{cases} \\ \eta_{\text{initial}} = \frac{\eta_{\max}}{\text{div\_factor}}, \quad \eta_{\min} = \frac{\eta_{\text{initial}}}{\text{final\_div\_factor}} \end{aligned}

Step-based: Calls step() per batch, not per epoch (unlike CosineAnnealingLR).
Total steps critical: Must specify exact total_steps (epochs × batches_per_epoch).
Faster convergence: Often achieves better results with shorter training time.
Empirically strong: Proposed in "A disciplined approach to neural network training" (Smith & Topin).
Momentum cycling: Default cycles momentum (inverse of lr) for better dynamics.
pct_start controls shape: Higher pct_start = longer ascent, longer descent phase.
Anneal strategy: 'cos' usually slightly better than 'linear', but difference small.
div_factor controls start: initial_lr = max_lr / div_factor, adjust for warmup speed.
final_div_factor controls end: min_lr = max_lr / final_div_factor, typically 100.
Not for early stopping: Designed for fixed-duration training (can't easily early stop).

Examples

// Basic OneCycleLR for 1000 steps, max lr 0.1
const scheduler = new torch.optim.OneCycleLR(optimizer, {
  max_lr: 0.1,
  total_steps: 1000  // e.g., 10 epochs × 100 batches
});

for (const batch of dataloader) {
  // train step
  scheduler.step();
}

// Cosine annealing with custom pct_start
const scheduler = new torch.optim.OneCycleLR(optimizer, {
  max_lr: 0.1,
  total_steps: 1000,
  pct_start: 0.4,          // Spend 40% ascending, 60% descending
  anneal_strategy: 'cos'   // Cosine annealing
});

// Linear annealing with longer ascent phase
const scheduler = new torch.optim.OneCycleLR(optimizer, {
  max_lr: 0.1,
  total_steps: 5000,
  pct_start: 0.5,              // Spend half the time ascending
  anneal_strategy: 'linear',  // Linear annealing
  div_factor: 10              // Start at 0.01 (0.1/10)
});

// Without momentum cycling (just lr cycling)
const scheduler = new torch.optim.OneCycleLR(optimizer, {
  max_lr: 0.1,
  total_steps: 1000,
  cycle_momentum: false  // Only cycle lr, not momentum
});

torch.optim.lr_scheduler.OneCycleLR

Constructor Parameters

Examples

See Also

torch.optim.lr_scheduler.OneCycleLR

Constructor Parameters

Examples

See Also