torch.optim.lr_scheduler.MultiStepLR

class MultiStepLR extends LRScheduler

new MultiStepLR(optimizer: Optimizer, options: {
/** List of epoch indices. Must be increasing. */
milestones: number[];
/** Multiplicative factor of learning rate decay (default: 0.1) */
gamma?: number;
/** The index of last epoch (default: -1) */
last_epoch?: number;
/** Whether to print a message for each update (default: false) */
verbose?: boolean;
})

Constructor Parameters

optimizerOptimizer: Wrapped optimizer
options{ /** List of epoch indices. Must be increasing. */ milestones: number[]; /** Multiplicative factor of learning rate decay (default: 0.1) */ gamma?: number; /** The index of last epoch (default: -1) */ last_epoch?: number; /** Whether to print a message for each update (default: false) */ verbose?: boolean; }: Scheduler options

milestones(Set<number>): – List of epoch indices to decay LR
gamma(number): – Multiplicative factor of learning rate decay

MultiStepLR scheduler: Decay learning rate by gamma at specified milestone epochs.

MultiStepLR is a more flexible version of StepLR. Instead of decaying every N epochs, you specify exactly which epochs to decay at. This allows for fine-grained control over the learning rate schedule, perfect for scenarios where you know exactly when to reduce the learning rate.

Key differences from StepLR:

StepLR: Regular intervals (every step_size epochs)
MultiStepLR: Custom intervals (at specific milestone epochs)
MultiStepLR provides more control at cost of manual specification

When to use MultiStepLR:

When decay epochs are not evenly spaced
Fine-grained control over learning rate decay schedule
Multi-phase training with different decay schedules per phase
When you know good decay points from prior experiments
Classic vision models: often trained with fixed milestones like [30, 60, 90]

Trade-offs:

More manual work to specify milestones vs StepLR (automatic intervals)
Rigid schedule (like StepLR) - doesn't adapt to training progress
Cumulative decay: decays stack (η decreases by gamma each time)

Algorithm: Multiplies learning rate by gamma at each milestone epoch:

If epoch is in milestones: η_t *= gamma
Example: milestones=[30, 80], gamma=0.1
- Epochs 0-29: lr = η_0
- Epoch 30: lr *= 0.1 = 0.1 * η_0
- Epochs 31-79: lr = 0.1 * η_0
- Epoch 80: lr *= 0.1 = 0.01 * η_0
- Epochs 81+: lr = 0.01 * η_0

\begin{aligned} \eta_t = \begin{cases} \eta_{t-1} \cdot \gamma & \text{if } t \in \text{milestones} \\ \eta_{t-1} & \text{otherwise} \end{cases} \\ \eta_t = \eta_0 \cdot \gamma^k \text{ where } k = \text{number of milestones reached by epoch } t \end{aligned}

Explicit milestones: Clear control over exactly when decay happens.
Order matters: Milestones must be in increasing order.
Cumulative decay: Multiple milestones compound (e.g., [30, 60] → 0.1 × 0.1 = 0.01 at epoch 60).
Common pattern: [30, 60, 90] or [30, 80] work well for many vision tasks.
Comparison: MultiStepLR is StepLR with custom intervals instead of regular spacing.
Rigid schedule: Doesn't adapt to actual training progress.
Validation agnostic: Doesn't use validation metrics (see ReduceOnPlateau).
Popular in CV: Standard choice for computer vision before cosine annealing became common.

Examples

// Decay at specific epochs: 30, 80
const scheduler = new torch.optim.MultiStepLR(optimizer, {
  milestones: [30, 80],
  gamma: 0.1
});
// Decay schedule: 0.1 at epoch 30, 0.01 at epoch 80

// Different decay pattern: decay more frequently early, less later
const scheduler = new torch.optim.MultiStepLR(optimizer, {
  milestones: [10, 20, 40, 80],
  gamma: 0.5  // Less aggressive decay
});
// Decreases: 0.5, 0.25, 0.125, 0.0625 at each milestone

// Resume from checkpoint with correct milestones
const checkpoint = load_checkpoint('model.pth');
const scheduler = new torch.optim.MultiStepLR(optimizer, {
  milestones: [30, 60, 90],
  gamma: 0.1,
  last_epoch: checkpoint.epoch - 1
});

// Multi-phase training with different milestones
// Phase 1: Quick training with frequent decay
let scheduler = new torch.optim.MultiStepLR(optimizer, {
  milestones: [5, 10, 15],
  gamma: 0.5,
  last_epoch: -1
});
for (let epoch = 0; epoch < 20; epoch++) { train(); scheduler.step(); }

// Phase 2: Fine-tuning with sparser decay
scheduler = new torch.optim.MultiStepLR(optimizer, {
  milestones: [35, 55],  // Actual epochs would be 20+35=55, 20+55=75
  gamma: 0.5,
  last_epoch: 19  // Continue from where we left off
});
for (let epoch = 20; epoch < 60; epoch++) { train(); scheduler.step(); }

torch.optim.lr_scheduler.MultiStepLR

Constructor Parameters

Examples

See Also

torch.optim.lr_scheduler.MultiStepLR

Constructor Parameters

Examples

See Also