torch.optim.lr_scheduler.PolynomialLR
class PolynomialLR extends LRSchedulernew PolynomialLR(optimizer: Optimizer, options: {
/** The number of steps for polynomial decay (default: 5) */
total_iters?: number;
/** The power of the polynomial (default: 1.0, linear) */
power?: number;
/** The index of last epoch (default: -1) */
last_epoch?: number;
/** Whether to print a message for each update (default: false) */
verbose?: boolean;
} = {})
Constructor Parameters
optimizerOptimizer- Wrapped optimizer
options{ /** The number of steps for polynomial decay (default: 5) */ total_iters?: number; /** The power of the polynomial (default: 1.0, linear) */ power?: number; /** The index of last epoch (default: -1) */ last_epoch?: number; /** Whether to print a message for each update (default: false) */ verbose?: boolean; }optional- Scheduler options
total_iters(number)- – Total number of training iterations
power(number)- – The power of the polynomial
PolynomialLR scheduler: Polynomial decay schedule over fixed period.
PolynomialLR decays the learning rate using a polynomial function. The degree of the polynomial (power parameter) controls the shape of decay. This provides smooth, flexible decay patterns between linear (power=1) and aggressive (power=2+) schedules.
Comparison to other schedules:
- LinearLR: power=1 (linear)
- PolynomialLR with power=2: quadratic curve (starts slow, accelerates decay)
- CosineAnnealingLR: cosine curve (smooth but specific shape)
- StepLR: step-wise (discrete jumps)
When to use PolynomialLR:
- When you want smooth decay over a fixed period
- Need flexible control over decay curve shape (via power parameter)
- Fine-tuning where gradual reduction works better than abrupt drops
- Comparison: PolynomialLR vs CosineAnnealingLR (both smooth, different curves)
Trade-offs:
- Smooth decay (like cosine) vs step-wise (like StepLR)
- Requires knowing total_iters in advance
- Power parameter tuning affects convergence
- Less common than CosineAnnealingLR in modern practice
Algorithm: Decays learning rate using polynomial formula:
- η_t = η_min + (η_base - η_min) * ((1 - t / T)^p)
- where t is current iteration, T is total_iters, p is power
- power=1: linear decay
- power=2: quadratic decay (slower initially, faster at end)
- power=3: cubic decay (even more aggressive end decay)
- Power controls shape: power=1 linear, power1 more aggressive at end.
- Fixed duration: Requires specifying total_iters (like CosineAnnealingLR).
- Smooth decay: Continuous smooth function (unlike StepLR).
- Less common: CosineAnnealingLR more popular than PolynomialLR in practice.
- Comparison: PolynomialLR more flexible than CosineAnnealingLR (custom power).
- Empirical: power=1 or 2 usually sufficient, rarely need higher powers.
- Minimum lr: Defaults to 0, consider setting eta_min for fine-tuning.
Examples
// Linear decay (power=1) over 100 epochs
const scheduler = new torch.optim.PolynomialLR(optimizer, {
total_iters: 100,
power: 1.0
});// Quadratic decay (power=2): slower start, faster at end
const scheduler = new torch.optim.PolynomialLR(optimizer, {
total_iters: 100,
power: 2.0
});
// Decays more aggressively in later epochs// Comparison: different power values
const linear = new torch.optim.PolynomialLR(optimizer, { total_iters: 100, power: 1.0 });
const quadratic = new torch.optim.PolynomialLR(optimizer, { total_iters: 100, power: 2.0 });
const cubic = new torch.optim.PolynomialLR(optimizer, { total_iters: 100, power: 3.0 });
// Higher power: slower start, more aggressive end
// Lower power: more uniform decay