torch.optim.lr_scheduler.MultiStepLR
class MultiStepLR extends LRSchedulernew MultiStepLR(optimizer: Optimizer, options: {
/** List of epoch indices. Must be increasing. */
milestones: number[];
/** Multiplicative factor of learning rate decay (default: 0.1) */
gamma?: number;
/** The index of last epoch (default: -1) */
last_epoch?: number;
/** Whether to print a message for each update (default: false) */
verbose?: boolean;
})
Constructor Parameters
optimizerOptimizer- Wrapped optimizer
options{ /** List of epoch indices. Must be increasing. */ milestones: number[]; /** Multiplicative factor of learning rate decay (default: 0.1) */ gamma?: number; /** The index of last epoch (default: -1) */ last_epoch?: number; /** Whether to print a message for each update (default: false) */ verbose?: boolean; }- Scheduler options
milestones(Set<number>)- – List of epoch indices to decay LR
gamma(number)- – Multiplicative factor of learning rate decay
MultiStepLR scheduler: Decay learning rate by gamma at specified milestone epochs.
MultiStepLR is a more flexible version of StepLR. Instead of decaying every N epochs, you specify exactly which epochs to decay at. This allows for fine-grained control over the learning rate schedule, perfect for scenarios where you know exactly when to reduce the learning rate.
Key differences from StepLR:
- StepLR: Regular intervals (every step_size epochs)
- MultiStepLR: Custom intervals (at specific milestone epochs)
- MultiStepLR provides more control at cost of manual specification
When to use MultiStepLR:
- When decay epochs are not evenly spaced
- Fine-grained control over learning rate decay schedule
- Multi-phase training with different decay schedules per phase
- When you know good decay points from prior experiments
- Classic vision models: often trained with fixed milestones like [30, 60, 90]
Trade-offs:
- More manual work to specify milestones vs StepLR (automatic intervals)
- Rigid schedule (like StepLR) - doesn't adapt to training progress
- Cumulative decay: decays stack (η decreases by gamma each time)
Algorithm: Multiplies learning rate by gamma at each milestone epoch:
- If epoch is in milestones: η_t *= gamma
- Example: milestones=[30, 80], gamma=0.1
- Epochs 0-29: lr = η_0
- Epoch 30: lr *= 0.1 = 0.1 * η_0
- Epochs 31-79: lr = 0.1 * η_0
- Epoch 80: lr *= 0.1 = 0.01 * η_0
- Epochs 81+: lr = 0.01 * η_0
- Explicit milestones: Clear control over exactly when decay happens.
- Order matters: Milestones must be in increasing order.
- Cumulative decay: Multiple milestones compound (e.g., [30, 60] → 0.1 × 0.1 = 0.01 at epoch 60).
- Common pattern: [30, 60, 90] or [30, 80] work well for many vision tasks.
- Comparison: MultiStepLR is StepLR with custom intervals instead of regular spacing.
- Rigid schedule: Doesn't adapt to actual training progress.
- Validation agnostic: Doesn't use validation metrics (see ReduceOnPlateau).
- Popular in CV: Standard choice for computer vision before cosine annealing became common.
Examples
// Decay at specific epochs: 30, 80
const scheduler = new torch.optim.MultiStepLR(optimizer, {
milestones: [30, 80],
gamma: 0.1
});
// Decay schedule: 0.1 at epoch 30, 0.01 at epoch 80// Different decay pattern: decay more frequently early, less later
const scheduler = new torch.optim.MultiStepLR(optimizer, {
milestones: [10, 20, 40, 80],
gamma: 0.5 // Less aggressive decay
});
// Decreases: 0.5, 0.25, 0.125, 0.0625 at each milestone// Resume from checkpoint with correct milestones
const checkpoint = load_checkpoint('model.pth');
const scheduler = new torch.optim.MultiStepLR(optimizer, {
milestones: [30, 60, 90],
gamma: 0.1,
last_epoch: checkpoint.epoch - 1
});// Multi-phase training with different milestones
// Phase 1: Quick training with frequent decay
let scheduler = new torch.optim.MultiStepLR(optimizer, {
milestones: [5, 10, 15],
gamma: 0.5,
last_epoch: -1
});
for (let epoch = 0; epoch < 20; epoch++) { train(); scheduler.step(); }
// Phase 2: Fine-tuning with sparser decay
scheduler = new torch.optim.MultiStepLR(optimizer, {
milestones: [35, 55], // Actual epochs would be 20+35=55, 20+55=75
gamma: 0.5,
last_epoch: 19 // Continue from where we left off
});
for (let epoch = 20; epoch < 60; epoch++) { train(); scheduler.step(); }