Skip to main content
torch.js has not been released yet.
torch.js logotorch.js logotorch.js
PlaygroundContact
Login
Documentation
IntroductionType SafetyTensor ExpressionsTensor IndexingEinsumEinopsAutogradTraining a ModelProfiling & MemoryPyTorch MigrationBest PracticesRuntimesPerformancePyTorch CompatibilityBenchmarksDType Coverage
AnnealStrategyChainedSchedulerConstantLRCosineAnnealingLRCosineAnnealingWarmRestartsCyclicLRExponentialLRget_last_lrget_last_lrget_last_lrget_last_lrget_last_lrget_last_lrget_lrget_lrget_lrLambdaLRLinearLRload_state_dictload_state_dictload_state_dictload_state_dictload_state_dictload_state_dictLRLambdaLRSchedulerLRSchedulerOptionsMultiplicativeLRMultiStepLROneCycleLRPlateauModePolynomialLRprint_lrPrintLrOptionsReduceLROnPlateauScaleFnScaleModeSchedulerStateDictSequentialLRstate_dictstate_dictstate_dictstate_dictstate_dictstate_dictstepstepstepstepstepstepstepStepLRStepOptions
AdadeltaAdadeltaOptionsAdafactorAdafactorOptionsAdagradAdagradOptionsAdamAdamaxAdamaxOptionsAdamOptionsAdamWAdamWOptionsadd_param_groupASGDASGDOptionsget_averaged_paramLBFGSLBFGSOptionsLBFGSStepOptionsload_state_dictLoadStateDictPostHookLoadStateDictPreHookMuonMuonOptionsNAdamNAdamOptionsOptimizerOptimizerStateOptimizerStateDictParamGroupRAdamRAdamOptionsregister_load_state_dict_post_hookregister_load_state_dict_pre_hookregister_state_dict_post_hookregister_state_dict_pre_hookregister_step_post_hookregister_step_pre_hookRemovableHandleRMSpropRMSpropOptionsRpropRpropOptionsSGDSGDOptionsSparseAdamSparseAdamOptionsstate_dictstate_dict_asyncStateDictPostHookStateDictPreHookstepstepStepPostHookStepPreHookstepWithClosurezero_grad
absacosacoshAdaptivePool1dShapeAdaptivePool2dShapeaddaddbmmAddbmmOptionsaddcdivAddcdivOptionsaddcmulAddcmulOptionsaddmmAddmmOptionsaddmvAddmvOptionsaddrAddrOptionsadjointallallcloseAllcloseOptionsAlphaBetaOptionsamaxaminaminmaxAminmaxOptionsangleanyapplyOutarangeare_deterministic_algorithms_enabledargmaxargminargsortargwhereas_stridedas_tensorasinasinhAssertNoShapeErrorAssertNotErrorAsStridedOptionsAtat_error_index_out_of_boundsatanatan2atanhatleast_1datleast_2datleast_3dAtShapeautocast_decrement_nestingautocast_increment_nestingautograd_gradient_mismatch_errorautograd_not_registered_errorAutogradConfigAutogradDeviceAutogradDTypeAutogradEntryAutogradHandleAutogradHandleImplAxesRecordBackwardFnbaddbmmBaddbmmOptionsbartlett_windowBaseKernelConfigbatch_dimensions_do_not_match_errorbernoulliBernoulliOptionsBinaryBackwardFnBinaryBroadcastResultBinaryDTypeBinaryKernelConfigCPUBinaryKernelCPUBinaryOpConfigBinaryOpNamesBinaryOpSchemaBinaryOptionsbincountBincountOptionsbitwise_andbitwise_left_shiftbitwise_notbitwise_orbitwise_right_shiftbitwise_xorblackman_windowblock_diagbmmBooleanDTypeRulebroadcast_error_incompatible_dimensionsbroadcast_shapesbroadcast_tensorsbroadcast_toBroadcastShapeBroadcastShapeRulebroadcastShapesbucketizeBucketizeOptionsBufferUsagebuildEinopsErrorbuildErrorMessagecanBroadcastTocartesian_prodcatCatOptionsCatShapeCauchyOptionscdistCdistOptionsceilceluCeluFunctionalOptionschain_matmulCheckShapeErrorCholeskyShapechunkchunk_error_dim_out_of_rangeChunkOptionsclampClampOptionsclear_autocast_cacheclearEinopsCacheclearEinsumCacheclonecolumn_stackcombinationsCombinationsOptionscompiled_with_cxx11_abicomplexconjconj_physicalcontiguousConv1dShapeConv2dShapeConv3dShapeConvTranspose2dShapecopysigncorrcoefcoscoshcount_nonzeroCountNonzeroOptionscovcoverage_reportcoverageReportCoverageReportCovOptionsCPUForwardFnCPUKernelConfigCPUKernelEntryCPUOnlyResultCPUTensorDatacreateCumExtremeResultcreateTorchCreationOpSchemaCumExtremeResultcummaxcummincumprodCumShapecumsumcumulative_trapezoidCumulativeOptionsCumulativeOptionsWithDimdeg2raddetachDeterministicOptionsDetShapeDevicedevice_error_requiresDeviceBufferDeviceCapabilitiesDeviceCheckedResultDeviceConfigDeviceContextDeviceEntryDeviceHandleDeviceInputDeviceOptionsDeviceRegistryDeviceTypediagdiag_embedDiagEmbedOptionsdiagflatDiagflatOptionsDiagFlatOptionsdiagonal_scatterDiagonalOptionsDiagonalScatterOptionsDiagOptionsDiagShapediffDiffOptionsdigammadimension_error_out_of_rangeDispatchConfigdistDistOptionsdivdotDotShapeRuleDoubleDoubleDimdropoutDropoutFunctionalOptionsdsplitdstackDTypedtype_already_registered_errordtype_components_mismatch_errordtype_not_found_errorDTypeComponentsDTypeConfigDTypeCoverageReportDTypeDisplayConfigDTypeEntryDTypeHandleDTypeHandleImplDTypeInfoDTypeRegistryDTypeRuleDTypeSerializationConfigDynamicShapeEigShapeeinops_error_ambiguous_decompositioneinops_error_anonymous_in_outputeinops_error_dimension_mismatcheinops_error_invalid_patterneinops_error_reduce_undefined_outputeinops_error_repeat_missing_sizeeinops_error_undefined_axiseinsumeinsum_error_dimension_mismatcheinsum_error_index_out_of_rangeeinsum_error_invalid_equationeinsum_error_invalid_sublist_elementeinsum_error_operand_count_mismatcheinsum_error_subscript_rank_mismatcheinsum_error_unknown_output_indexEinsumOptionsEinsumOutputShapeEllipsiseluelu_EluFunctionalOptionsembedding_bag_error_requires_2d_inputemptyempty_cacheempty_likeeqequalerferfcerfinvexpexp2expandexpand_asexpand_error_incompatibleExpandShapeexpm1ExponentialOptionseyeEyeOptionsfftFFTOptionsfindKernelWithPredicatefindSimilarPatternsflattenFlattenOptionsFlattenShapeflipflip_error_dim_out_of_rangefliplrFlipShapeflipudfloat_powerFloatDTypeRulefloorfloor_dividefmaxfminfmodformatEquationErrorformatShapefracfrexpfrombufferfullfull_likefunction_already_registered_errorFunctionConfigFunctionEntryFunctionHandlegathergather_error_dim_out_of_rangeGatherShapegcdgegeluGeometricOptionsget_autocast_cpu_dtypeget_autocast_gpu_dtypeget_autocast_ipu_dtypeget_autocast_xla_dtypeget_default_deviceget_default_dtypeget_deterministic_debug_modeget_device_configget_device_contextget_device_moduleget_dtype_infoget_file_pathget_float32_matmul_precisionget_num_interop_threadsget_num_threadsget_op_infoget_printoptionsget_real_dtypeget_rng_stategetAutogradgetDTypegetEinopsCacheSizegetEinsumCacheSizegetFunctiongetKernelgetMethodgetOpInfoGetOpKindGetOpSchemagetScalarKernelgluGluFunctionalOptionsGradContextGradFnGradientsForgtHalfHalfDimhamming_windowhann_windowhardshrinkhardsigmoidhardswishhardtanhhardtanh_HardtanhFunctionalOptionshas_autogradhas_devicehas_dtypehas_kernelhasAutogradhasDTypehasFunctionhasKernelhasMethodhasScalarKernelHasShapeErrorheavisidehistcHistcOptionshistogramHistogramOptionsHistogramResulthsplithstackhypoti0IdentityShapeifftimagindex_addindex_copyindex_fillindex_putindex_reduceindex_selectindex_select_error_dim_out_of_rangeIndexPutOptionsIndexSelectShapeIndexSpecIndicesOptionsIndicesSpecinitialize_deviceInputsForInsertDiminvalid_config_errorinverseInverseShapeirfftis_anomaly_check_nan_enabledis_anomaly_enabledis_autocast_cache_enabledis_autocast_cpu_enabledis_autocast_ipu_enabledis_autocast_xla_enabledis_complexis_complex_dtypeis_cpu_only_modeis_deterministic_algorithms_warn_only_enabledis_floating_pointis_floating_point_dtypeis_inference_mode_enabledis_nonzerois_tensoris_warn_always_enabledis_webgpu_availableIs2DIsAtLeast1DIsBinaryOpIsBinaryOpNameiscloseIscloseOptionsisfiniteisinisinfisnanisneginfisposinfisrealIsReductionOpIsReductionOpNameIsRegistryErrorIsShapeErroristftISTFTOptionsIsUnaryOpIsUnaryOpNameitem_error_not_scalarItemResultkaiser_windowKaiserWindowOptionskernel_not_registered_errorkernel_signature_mismatch_errorKernelConfigKernelConfigWebGPUKernelEntryKernelHandleKernelInfoKernelPredicateKernelRegistryKernelWebGPUkronkthvalueKthvalueOptionslcmldexpleleaky_reluleaky_relu_LeakyReluFunctionalOptionslerplevenshteinDistancelgammalinalg_error_not_square_matrixlinalg_error_requires_2dlinalg_error_requires_at_least_2dlinearlinspacelist_custom_deviceslist_custom_dtypeslist_deviceslist_dtypeslist_functionslist_kernelslist_methodslist_opslistCustomDTypeslistDTypeslistFunctionslistKernelsListKernelsOptionslistMethodslistOpsListOpsOptionsloglog_softmaxlog10log1plog2logaddexplogaddexp2logcumsumexplogical_andlogical_notlogical_orlogical_xorLogitOptionsLogNormalOptionsLogOptionslogsigmoidlogspacelogsumexpLogsumexpOptionsltLUShapeLuSolveOptionsmasked_fillmasked_selectmasked_select_asyncMaskSpecmatmulmatmul_error_inner_dimensions_do_not_matchMatmul2DShapeMatmulShapeMatmulShapeRuleMatrixTransposeShapemaxmaximummeanmedianmemory_statsmemory_summarymeshgridmethod_already_registered_errormethod_dtype_not_supported_errorMethodConfigMethodEntryMethodHandleminminimummishmmMMShapeRulemodemovedimmsortmulmultinomialmultinomial_asyncMultinomialAsyncOptionsMultinomialOptionsMultiplyBymvMVShapeRulenan_to_numnanmeannanmediannanquantileNanReductionOptionsnansumNanToNumOptionsnarrownarrow_copynarrow_error_length_exceeds_boundsnarrow_error_start_out_of_boundsNarrowShapeneneedsBroadcastnegNegativeDimnextafternonzeroNonzeroOptionsnormnormalNormalOptionsNormOptionsnumelonesones_likeop_kind_mismatch_errorop_not_found_errorOpCoverageEntryOpInfoOpKindOpNameOpSchemaOpSchemasouterOuterShapepackPackShapepermutepermute_error_dimension_count_mismatchPermuteShapepoissonpolarPool1dShapePool2dShapePool3dShapepositivepowpreluPrintOptionsprodprofiler_allow_cudagraph_cupti_lazy_reinit_cuda12promote_typesPromoteDTypeRulePutOptionsquantileQuantileOptionsrad2degrandrand_likerandintrandint_likeRandintLikeOptionsRandintOptionsrandnrandn_likeRandomLikeOptionsRandomOptionsrandpermRangeSpecRankravelrealrearrangeRearrangeOptionsRearrangeShapereciprocalreduceReduceOperationReduceOptionsReduceShapeReductionKernelConfigCPUReductionKernelCPUReductionOpNamesReductionOpSchemaReductionOptionsReductionShapeRuleregister_backwardregister_deviceregister_dtyperegister_forwardregister_functionregister_methodregister_scalar_forwardregisterAutogradRegisterBackwardOptionsregisterBinaryOpregisterDTypeRegisterDTypeOptionsRegisteredDTyperegisterFunctionRegisterFunctionOptionsregisterKernelRegisterKernelOptionsregisterMethodRegisterMethodOptionsregisterScalarKernelregisterUnaryOpregistration_failed_errorrelurelu_relu6ReluFunctionalOptionsremainderRemoveDimrepeatrepeat_interleaveRepeatInterleaveOptionsRepeatOptionsRepeatShapeReplaceDimrequireWebGPUreset_peak_memory_statsreshapeReshapeShaperesult_typerfftrollRollOptionsrot90Rot90Optionsroundrrelurrelu_RreluFunctionalOptionsrsqrtSafeExpandShapeSameDTypeRuleSameShapeRuleSaveForBackwardScalarCPUForwardFnScalarCPUKernelConfigScalarKernelEntryScalarKernelHandleScalarWebGPUKernelConfigScaleDimscatterscatter_addscatter_add_scatter_error_dim_out_of_rangescatter_reducescatter_reduce_ScatterReduceOptionsScatterShapesearchsortedSearchSortedOptionsselectselect_error_index_out_of_boundsselect_scatterSelectShapeseluset_default_deviceset_default_tensor_typeset_deterministic_debug_modeset_float32_matmul_precisionset_printoptionsset_warn_alwaysSetupContextFnShapeShapeCheckedResultShapedTensorShapeErrorMessageShapeOpSchemaShapeRulesigmoidsignsignbitsilusinsincsinhSizeOptionsslice_error_out_of_boundsslice_scatterSliceOptionsSliceScatterOptionsSliceShapeSliceSpecsoftmaxsoftmax_error_dim_out_of_rangeSoftmaxShapesoftminSoftminFunctionalOptionssoftplusSoftplusFunctionalOptionssoftshrinksoftsignsortSortOptionssplitsplit_error_dim_out_of_rangeSplitOptionssqrtsquaresqueezeSqueezeOptionsSqueezeShapestackStackOptionsStackShapestdstd_meanStdVarMeanOptionsStdVarOptionsstftSTFTOptionsStrideOptionssubSublistSublistElementSubscriptIndexsumSVDShapeswapaxessym_floatsym_intsym_notttaketake_along_dimTakeAlongDimOptionstantanhtanhshrinktensortensor_splitTensorCreatorTensorDatatensordotTensordotOptionsTensorLikeTensorMetaTensorOptionsTensorStoragethresholdthreshold_tileTileShapeToOptionstopkTopkOptionsTorchtraceTraceShapetransposetranspose_dims_error_out_of_rangetranspose_error_requires_2d_tensorTransposeDimsShapeTransposeDimsShapeCheckedTransposeShapetrapezoidTrapezoidOptionsTriangularOptionstriltril_indicesTriOptionsTripletriutriu_indicestrue_dividetruncTupleOfLengthTypedArrayTypedArrayForTypedStorageTypeOptionsUnaryBackwardFnUnaryDTypeUnaryKernelConfigCPUUnaryKernelCPUUnaryOpConfigUnaryOpFnUnaryOpNamesUnaryOpParamsUnaryOpSchemaUnaryOptionsunbindunbind_error_dim_out_of_rangeUnbindOptionsunflattenUniformOptionsuniqueunique_consecutiveUniqueConsecutiveOptionsUniqueOptionsunpackUnpackShapeunravel_indexunregister_deviceunsqueezeUnsqueezeOptionsUnsqueezeShapeuse_deterministic_algorithmsValidateBatchedSquareMatrixValidateChunkDimValidatedEinsumShapevalidateDeviceValidateDeviceValidatedRearrangeShapeValidatedReduceShapeValidatedRepeatShapevalidateDTypeValidateEinsumValidateOperandCountValidateRanksValidateScalarValidateSplitDimValidateSquareMatrixValidateUnbindDimValueOptionsvar_var_meanvdotviewview_as_complexview_as_realvmapvsplitvstackWebGPUKernelConfigWebGPUOnlyResultWebGPUTensorDatawhereWindowOptionsxlogyzeroszeros_like
torch.js· 2026
LegalTerms of UsePrivacy Policy
/
/
  1. docs
  2. torch.js
  3. torch
  4. optim
  5. lr_scheduler
  6. CosineAnnealingLR

torch.optim.lr_scheduler.CosineAnnealingLR

class CosineAnnealingLR extends LRScheduler
new CosineAnnealingLR(optimizer: Optimizer, options: { /** Maximum number of iterations */ T_max: number; /** Minimum learning rate (default: 0) */ eta_min?: number; /** The index of last epoch (default: -1) */ last_epoch?: number; /** Whether to print a message for each update (default: false) */ verbose?: boolean; })

Constructor Parameters

optimizerOptimizer
Wrapped optimizer
options{ /** Maximum number of iterations */ T_max: number; /** Minimum learning rate (default: 0) */ eta_min?: number; /** The index of last epoch (default: -1) */ last_epoch?: number; /** Whether to print a message for each update (default: false) */ verbose?: boolean; }
Scheduler options
T_max(number)
– Maximum number of iterations
eta_min(number)
– Minimum learning rate

CosineAnnealingLR scheduler: Anneals learning rate using cosine curve to zero.

CosineAnnealingLR sets the learning rate of each parameter group using a cosine annealing schedule. The learning rate decreases smoothly from initial value to a minimum value following a cosine curve over T_max epochs, then remains constant. This smooth decay is empirically better than step-wise decay for many tasks.

Key advantages over StepLR:

  • Smooth continuous decay instead of abrupt drops
  • Often achieves better final accuracy (flattens loss landscape)
  • Reduces risk of missing good local minima at decay boundaries
  • Based on theoretical motivations from local loss landscape geometry

When to use CosineAnnealingLR:

  • Modern deep learning (recommended as default for many tasks)
  • Transformer models (especially with warmup)
  • When you want smooth learning rate decay
  • When final model quality is more important than convergence speed
  • Good baseline: use CosineAnnealingLR unless you have strong reasons otherwise

Trade-offs:

  • Learning rate decays continuously, can be slower initially
  • Requires specifying T_max (total training epochs)
  • Doesn't adapt to actual training progress (metric-agnostic)
  • Compare to ReduceOnPlateau for metric-based adaptation
  • Compare to CosineAnnealingWarmRestarts for periodic restarts

Algorithm: Learning rate decays from η_base to η_min following a cosine curve:

  • η_t = η_min + (η_base - η_min) / 2 * (1 + cos(π * t / T_max))
  • t is current epoch (0 to T_max)
  • cos goes from 1 (at t=0) to -1 (at t=T_max)
  • Learning rate goes from η_base (at t=0) to η_min (at t=T_max)
ηt=ηmin⁡+ηbase−ηmin⁡2(1+cos⁡(πtTmax⁡))where t∈[0,Tmax⁡],ηt is the learning rate at epoch t\begin{aligned} \eta_t = \eta_{\min} + \frac{\eta_{\text{base}} - \eta_{\min}}{2} \left(1 + \cos\left(\frac{\pi t}{T_{\max}}\right)\right) \\ \text{where } t \in [0, T_{\max}], \eta_t \text{ is the learning rate at epoch } t \end{aligned}ηt​=ηmin​+2ηbase​−ηmin​​(1+cos(Tmax​πt​))where t∈[0,Tmax​],ηt​ is the learning rate at epoch t​
  • Smooth decay: Unlike StepLR's abrupt drops, cosine decay is continuous and smooth.
  • Good default: CosineAnnealingLR is recommended as default schedule before trying alternatives.
  • T_max critical: Set T_max to your planned total training epochs. Too small → lr becomes constant early.
  • eta_min important: Default eta_min=0 means learning stops at T_max. Consider eta_min 0 for fine-tuning.
  • Empirically strong: Cosine schedules show better generalization than step decay in modern architectures.
  • Paper: SGDR: Stochastic Gradient Descent with Warm Restarts (Loshchilov & Hutter, 2016) motivates cosine annealing.
  • Warm restarts: CosineAnnealingWarmRestarts offers periodic restarts - useful for escaping local minima.
  • Warmup common: Typically combined with warmup (LinearLR) for transformer training: warmup → cosine decay.
  • Not adaptive: Doesn't adapt to training progress. ReduceOnPlateau does adapt but is metric-aware.
  • Multiple parameters: Works with parameter groups, decays each group's lr by same cosine schedule.

Examples

// Standard CosineAnnealingLR for 100 epochs
const scheduler = new torch.optim.CosineAnnealingLR(optimizer, { T_max: 100 });
for (let epoch = 0; epoch < 100; epoch++) {
  train();
  validate();
  scheduler.step();
}
// CosineAnnealingLR with minimum learning rate
const scheduler = new torch.optim.CosineAnnealingLR(optimizer, {
  T_max: 100,
  eta_min: 0.0001  // Don't decay below 1e-4
});

// Without eta_min, lr decays to exactly 0 at T_max
// eta_min prevents optimization from completely stopping
// CosineAnnealingLR with warmup (chain with LinearLR)
const warmup = new torch.optim.LinearLR(optimizer, { total_iters: 10 });
const cosine = new torch.optim.CosineAnnealingLR(optimizer, { T_max: 90 });
const scheduler = new torch.optim.SequentialLR(optimizer, [warmup, cosine], [10]);

// Common in transformer training: warm up for 10 epochs, then cosine decay
// Resume training with CosineAnnealingLR
const checkpoint = load_checkpoint('model.pth');
const scheduler = new torch.optim.CosineAnnealingLR(optimizer, {
  T_max: 100,
  eta_min: 0.0001,
  last_epoch: checkpoint.epoch - 1  // Resume at correct position
});
// Comparison: different T_max values affect schedule shape
const scheduler_short = new torch.optim.CosineAnnealingLR(optimizer, { T_max: 50 });  // Fast decay
const scheduler_long = new torch.optim.CosineAnnealingLR(optimizer, { T_max: 200 }); // Slow decay

// Longer T_max means more gradual decay, giving optimizer longer to converge
// Shorter T_max decays faster to minimum, good if eta_min is not too small

See Also

  • PyTorch torch.optim.lr_scheduler.CosineAnnealingLR
Previous
ConstantLR
Next
CosineAnnealingWarmRestarts