torch.optim.lr_scheduler.CosineAnnealingWarmRestarts
class CosineAnnealingWarmRestarts extends LRSchedulernew CosineAnnealingWarmRestarts(optimizer: Optimizer, options: {
/** Number of iterations for the first restart */
T_0: number;
/** Factor increasing T_i after a restart (default: 1) */
T_mult?: number;
/** Minimum learning rate (default: 0) */
eta_min?: number;
/** The index of last epoch (default: -1) */
last_epoch?: number;
/** Whether to print a message for each update (default: false) */
verbose?: boolean;
})
Constructor Parameters
optimizerOptimizer- Wrapped optimizer
options{ /** Number of iterations for the first restart */ T_0: number; /** Factor increasing T_i after a restart (default: 1) */ T_mult?: number; /** Minimum learning rate (default: 0) */ eta_min?: number; /** The index of last epoch (default: -1) */ last_epoch?: number; /** Whether to print a message for each update (default: false) */ verbose?: boolean; }- Scheduler options
T_0(number)- – Number of iterations for the first restart
T_mult(number)- – Factor to increase T_i after a restart
eta_min(number)- – Minimum learning rate
T_cur(number)- – Current position in the cycle
T_i(number)- – Current cycle period
CosineAnnealingWarmRestarts scheduler: Cosine annealing with periodic warm restarts.
CosineAnnealingWarmRestarts (SGDR) periodically resets the learning rate back to an initial high value within a cosine annealing schedule. Each restart happens at increasing intervals (controlled by T_mult). This helps escape local minima and improves generalization compared to monotonic decay.
Key benefits vs CosineAnnealingLR:
- Periodic restarts allow exploring different regions of loss landscape
- Often achieves better generalization without longer training
- Works well for achieving lower final loss values
When to use:
- When you want periodic lr resets to escape local minima
- Have time for longer training to benefit from restarts
- Research/competition scenarios where final accuracy matters most
Algorithm:
- Restart period T_i = T_0 * (T_mult ^ i) where i is restart number
- Each period uses cosine annealing from base_lr down to eta_min
- After period ends, restart: lr → base_lr, continue next period
- Periodic resets: Learning rate resets at each restart for exploration.
- T_mult multiplier: Controls how period grows (T_mult=2 → doubling, T_mult=1 → constant).
- Paper: SGDR from Loshchilov & Hutter (2016), same as CosineAnnealingLR paper.
- Longer training: Often needs more epochs to show benefit vs CosineAnnealingLR.
Examples
// Warm restarts every 10 epochs, doubling period each time
const scheduler = new torch.optim.CosineAnnealingWarmRestarts(optimizer, {
T_0: 10, // Initial period: 10 epochs
T_mult: 2 // Next period: 20, then 40, etc
});// Shorter restarts, less aggressive
const scheduler = new torch.optim.CosineAnnealingWarmRestarts(optimizer, {
T_0: 5,
T_mult: 1, // Constant period (5 epochs each restart)
eta_min: 1e-4 // Minimum learning rate
});