torch.optim.lr_scheduler.LinearLR

class LinearLR extends LRScheduler

new LinearLR(optimizer: Optimizer, options: {
/** The number to multiply LR at the start (default: 1/3) */
start_factor?: number;
/** The number to multiply LR at the end (default: 1.0) */
end_factor?: number;
/** Number of iterations for linear change (default: 5) */
total_iters?: number;
/** The index of last epoch (default: -1) */
last_epoch?: number;
/** Whether to print a message for each update (default: false) */
verbose?: boolean;
} = {})

Constructor Parameters

optimizerOptimizer: Wrapped optimizer
options{ /** The number to multiply LR at the start (default: 1/3) */ start_factor?: number; /** The number to multiply LR at the end (default: 1.0) */ end_factor?: number; /** Number of iterations for linear change (default: 5) */ total_iters?: number; /** The index of last epoch (default: -1) */ last_epoch?: number; /** Whether to print a message for each update (default: false) */ verbose?: boolean; }optional: Scheduler options

start_factor(number): – Starting multiplicative factor
end_factor(number): – Ending multiplicative factor
total_iters(number): – Number of iterations for linear change

LinearLR scheduler: Linear interpolation of learning rate multiplier.

LinearLR linearly interpolates the learning rate multiplier from start_factor to end_factor over total_iters iterations. The most common use is as a warmup phase at the beginning of training, linearly ramping up learning rate from a small value (e.g., 1/3 of base_lr) to full value.

Primary use cases:

Warmup phase: Ramp up from small lr to full lr over first N epochs
Linear decay: Linearly decay from full lr to zero
Chaining: Often combined with CosineAnnealingLR (warmup then cosine)

Why use warmup?

Prevents optimization instability at the start of training
Allows model to adjust to initial random state gracefully
Often improves final convergence and generalization
Especially important for transformers (standard practice)

When to use LinearLR:

First phase of training (before main schedule)
Transformer models (standard: warmup for ~10% of training)
When main schedule is CosineAnnealingLR or StepLR
Learning rate scheduling for supervised learning

Trade-offs:

Simple linear interpolation (no curve fitting)
Typically used as one phase in composite schedule
Alone (without chaining) linear decay is less common than step/cosine
Works best when total_iters is small relative to total training epochs

Algorithm: Linearly interpolates multiplier from start_factor to end_factor:

factor_t = start_factor + (end_factor - start_factor) * (t / total_iters)
η_t = base_lr * factor_t
After total_iters, learning rate remains at end_factor * base_lr

\begin{aligned} f_t = f_{\text{start}} + (f_{\text{end}} - f_{\text{start}}) \cdot \frac{t}{T} \\ \eta_t = \eta_{\text{base}} \cdot f_t \\ \text{After } T \text{ iterations: } \eta_t = \eta_{\text{base}} \cdot f_{\text{end}} \text{ (constant)} \end{aligned}

Warmup standard: LinearLR for warmup is standard in modern transformer training.
Warmup benefits: Stabilizes early training, often improves final accuracy by 1-2%.
Typical warmup: 10% of total training epochs works well empirically.
Chaining: Usually chained with another scheduler (CosineAnnealingLR) for full schedule.
Alone uncommon: Pure linear decay is less common than step or cosine decay.
Warmup amount: start_factor=0.1 for aggressive warmup, 1/3 for moderate warmup.
Parameter groups: Works with different learning rates per parameter group.
Composable: Designed to be first phase in SequentialLR or ChainedScheduler.

Examples

// Warmup: linear increase from 1/3 to 1.0 over 5 epochs
const scheduler = new torch.optim.LinearLR(optimizer, {
  start_factor: 1/3,
  end_factor: 1.0,
  total_iters: 5
});
for (let epoch = 0; epoch < 100; epoch++) {
  train();
  validate();
  scheduler.step();
}
// After epoch 5, learning rate stays at base_lr

// Standard warmup for transformers: 10% of total epochs
const total_epochs = 100;
const warmup_epochs = Math.floor(total_epochs * 0.1);
const warmup = new torch.optim.LinearLR(optimizer, {
  start_factor: 0.1,        // Start at 10% of base_lr
  end_factor: 1.0,           // Reach full base_lr
  total_iters: warmup_epochs // e.g., 10 epochs
});

// Warmup + Cosine annealing (common for transformers)
const warmup = new torch.optim.LinearLR(optimizer, {
  start_factor: 0.1,
  total_iters: 10
});

const cosine = new torch.optim.CosineAnnealingLR(optimizer, {
  T_max: 90  // Remaining 90 epochs
});

const scheduler = new torch.optim.SequentialLR(
  optimizer,
  [warmup, cosine],
  [10]  // Switch to cosine after 10 epochs
);

// Linear decay (opposite of warmup)
const scheduler = new torch.optim.LinearLR(optimizer, {
  start_factor: 1.0,  // Start at full lr
  end_factor: 0.0,    // Decay to zero
  total_iters: 50     // Over 50 epochs
});

torch.optim.lr_scheduler.LinearLR

Constructor Parameters

Examples

See Also

torch.optim.lr_scheduler.LinearLR

Constructor Parameters

Examples

See Also