torch.optim.lr_scheduler.CosineAnnealingWarmRestarts

class CosineAnnealingWarmRestarts extends LRScheduler

new CosineAnnealingWarmRestarts(optimizer: Optimizer, options: {
/** Number of iterations for the first restart */
T_0: number;
/** Factor increasing T_i after a restart (default: 1) */
T_mult?: number;
/** Minimum learning rate (default: 0) */
eta_min?: number;
/** The index of last epoch (default: -1) */
last_epoch?: number;
/** Whether to print a message for each update (default: false) */
verbose?: boolean;
})

Constructor Parameters

optimizerOptimizer: Wrapped optimizer
options{ /** Number of iterations for the first restart */ T_0: number; /** Factor increasing T_i after a restart (default: 1) */ T_mult?: number; /** Minimum learning rate (default: 0) */ eta_min?: number; /** The index of last epoch (default: -1) */ last_epoch?: number; /** Whether to print a message for each update (default: false) */ verbose?: boolean; }: Scheduler options

T_0(number): – Number of iterations for the first restart
T_mult(number): – Factor to increase T_i after a restart
eta_min(number): – Minimum learning rate
T_cur(number): – Current position in the cycle
T_i(number): – Current cycle period

CosineAnnealingWarmRestarts scheduler: Cosine annealing with periodic warm restarts.

CosineAnnealingWarmRestarts (SGDR) periodically resets the learning rate back to an initial high value within a cosine annealing schedule. Each restart happens at increasing intervals (controlled by T_mult). This helps escape local minima and improves generalization compared to monotonic decay.

Key benefits vs CosineAnnealingLR:

Periodic restarts allow exploring different regions of loss landscape
Often achieves better generalization without longer training
Works well for achieving lower final loss values

When to use:

When you want periodic lr resets to escape local minima
Have time for longer training to benefit from restarts
Research/competition scenarios where final accuracy matters most

Algorithm:

Restart period T_i = T_0 * (T_mult ^ i) where i is restart number
Each period uses cosine annealing from base_lr down to eta_min
After period ends, restart: lr → base_lr, continue next period

Periodic resets: Learning rate resets at each restart for exploration.
T_mult multiplier: Controls how period grows (T_mult=2 → doubling, T_mult=1 → constant).
Paper: SGDR from Loshchilov & Hutter (2016), same as CosineAnnealingLR paper.
Longer training: Often needs more epochs to show benefit vs CosineAnnealingLR.

Examples

// Warm restarts every 10 epochs, doubling period each time
const scheduler = new torch.optim.CosineAnnealingWarmRestarts(optimizer, {
  T_0: 10,   // Initial period: 10 epochs
  T_mult: 2  // Next period: 20, then 40, etc
});

// Shorter restarts, less aggressive
const scheduler = new torch.optim.CosineAnnealingWarmRestarts(optimizer, {
  T_0: 5,
  T_mult: 1,     // Constant period (5 epochs each restart)
  eta_min: 1e-4  // Minimum learning rate
});

torch.optim.lr_scheduler.CosineAnnealingWarmRestarts

Constructor Parameters

Examples

See Also

torch.optim.lr_scheduler.CosineAnnealingWarmRestarts

Constructor Parameters

Examples

See Also