torch.optim.lr_scheduler.LinearLR
class LinearLR extends LRSchedulernew LinearLR(optimizer: Optimizer, options: {
/** The number to multiply LR at the start (default: 1/3) */
start_factor?: number;
/** The number to multiply LR at the end (default: 1.0) */
end_factor?: number;
/** Number of iterations for linear change (default: 5) */
total_iters?: number;
/** The index of last epoch (default: -1) */
last_epoch?: number;
/** Whether to print a message for each update (default: false) */
verbose?: boolean;
} = {})
Constructor Parameters
optimizerOptimizer- Wrapped optimizer
options{ /** The number to multiply LR at the start (default: 1/3) */ start_factor?: number; /** The number to multiply LR at the end (default: 1.0) */ end_factor?: number; /** Number of iterations for linear change (default: 5) */ total_iters?: number; /** The index of last epoch (default: -1) */ last_epoch?: number; /** Whether to print a message for each update (default: false) */ verbose?: boolean; }optional- Scheduler options
start_factor(number)- – Starting multiplicative factor
end_factor(number)- – Ending multiplicative factor
total_iters(number)- – Number of iterations for linear change
LinearLR scheduler: Linear interpolation of learning rate multiplier.
LinearLR linearly interpolates the learning rate multiplier from start_factor to end_factor over total_iters iterations. The most common use is as a warmup phase at the beginning of training, linearly ramping up learning rate from a small value (e.g., 1/3 of base_lr) to full value.
Primary use cases:
- Warmup phase: Ramp up from small lr to full lr over first N epochs
- Linear decay: Linearly decay from full lr to zero
- Chaining: Often combined with CosineAnnealingLR (warmup then cosine)
Why use warmup?
- Prevents optimization instability at the start of training
- Allows model to adjust to initial random state gracefully
- Often improves final convergence and generalization
- Especially important for transformers (standard practice)
When to use LinearLR:
- First phase of training (before main schedule)
- Transformer models (standard: warmup for ~10% of training)
- When main schedule is CosineAnnealingLR or StepLR
- Learning rate scheduling for supervised learning
Trade-offs:
- Simple linear interpolation (no curve fitting)
- Typically used as one phase in composite schedule
- Alone (without chaining) linear decay is less common than step/cosine
- Works best when total_iters is small relative to total training epochs
Algorithm: Linearly interpolates multiplier from start_factor to end_factor:
- factor_t = start_factor + (end_factor - start_factor) * (t / total_iters)
- η_t = base_lr * factor_t
- After total_iters, learning rate remains at end_factor * base_lr
- Warmup standard: LinearLR for warmup is standard in modern transformer training.
- Warmup benefits: Stabilizes early training, often improves final accuracy by 1-2%.
- Typical warmup: 10% of total training epochs works well empirically.
- Chaining: Usually chained with another scheduler (CosineAnnealingLR) for full schedule.
- Alone uncommon: Pure linear decay is less common than step or cosine decay.
- Warmup amount: start_factor=0.1 for aggressive warmup, 1/3 for moderate warmup.
- Parameter groups: Works with different learning rates per parameter group.
- Composable: Designed to be first phase in SequentialLR or ChainedScheduler.
Examples
// Warmup: linear increase from 1/3 to 1.0 over 5 epochs
const scheduler = new torch.optim.LinearLR(optimizer, {
start_factor: 1/3,
end_factor: 1.0,
total_iters: 5
});
for (let epoch = 0; epoch < 100; epoch++) {
train();
validate();
scheduler.step();
}
// After epoch 5, learning rate stays at base_lr// Standard warmup for transformers: 10% of total epochs
const total_epochs = 100;
const warmup_epochs = Math.floor(total_epochs * 0.1);
const warmup = new torch.optim.LinearLR(optimizer, {
start_factor: 0.1, // Start at 10% of base_lr
end_factor: 1.0, // Reach full base_lr
total_iters: warmup_epochs // e.g., 10 epochs
});// Warmup + Cosine annealing (common for transformers)
const warmup = new torch.optim.LinearLR(optimizer, {
start_factor: 0.1,
total_iters: 10
});
const cosine = new torch.optim.CosineAnnealingLR(optimizer, {
T_max: 90 // Remaining 90 epochs
});
const scheduler = new torch.optim.SequentialLR(
optimizer,
[warmup, cosine],
[10] // Switch to cosine after 10 epochs
);// Linear decay (opposite of warmup)
const scheduler = new torch.optim.LinearLR(optimizer, {
start_factor: 1.0, // Start at full lr
end_factor: 0.0, // Decay to zero
total_iters: 50 // Over 50 epochs
});