Skip to main content
torch.jstorch.jstorch.js
Getting StartedPlaygroundContact
Login
torch.jstorch.jstorch.js
Documentation
IntroductionType SafetyTensor IndexingEinsumEinopsAutogradTraining a ModelProfiling & MemoryPyTorch MigrationBest PracticesRuntimesPerformance
ActivationOptionsAdaptiveAvgPool1dAdaptiveAvgPool2dAdaptiveAvgPool3dAdaptiveLogSoftmaxWithLossAdaptiveMaxPool1dAdaptiveMaxPool2dAdaptiveMaxPool3dadd_moduleAlphaDropoutappendappendapplyAvgPool1dAvgPool1dOptionsAvgPool2dAvgPool2dOptionsAvgPool3dAvgPool3dOptionsBackwardHookBackwardPreHookBatchNorm1dBatchNorm2dBatchNorm3dBatchNormOptionsBCELossBCEWithLogitsLossBilinearBufferBufferOptionsBufferRegistrationHookbufferscallCELUCELUOptionsChannelShufflechildrenCircularPad1dCircularPad2dCircularPad3dclearConstantPad1dConstantPad2dConstantPad3dConv1dConv2dConv3dConvOptionsConvTranspose1dConvTranspose2dConvTranspose3dConvTransposeOptionsCosineEmbeddingLossCosineEmbeddingLossOptionsCosineSimilarityCosineSimilarityOptionscreatecreateCrossEntropyLossCTCLossdecodedecodedeleteDropoutDropout1dDropout2dDropout3dDropoutOptionsELUELUOptionsEmbeddingEmbeddingBagencodeencodeentriesentriesevalextendFeatureAlphaDropoutFlattenFlattenOptionsFoldFoldOptionsforwardforwardforwardforwardforwardforwardforwardforwardforward_with_targetForwardHookForwardPreHookFractionalMaxPool2dFractionalMaxPool3dfrom_pretrainedfrom_pretrainedGaussianNLLLossGELUGELUOptionsgenerate_square_subsequent_maskgetgetgetgetgetget_bufferget_parameterget_submoduleGLUGLUOptionsGroupNormGroupNormOptionsGRUGRUCellHardshrinkHardshrinkOptionsHardsigmoidHardswishHardtanhHardtanhOptionshashasHingeEmbeddingLossHingeEmbeddingLossOptionsHuberLossHuberLossOptionsIdentityInstanceNorm1dInstanceNorm2dInstanceNorm3dInstanceNormOptionsis_uninitialized_bufferis_uninitialized_parameteriterator]iterator]iterator]keyskeysKLDivLossL1LossL1LossOptionsLayerNormLayerNormOptionsLazyBatchNorm1dLazyBatchNorm2dLazyBatchNorm3dLazyConv1dLazyConv2dLazyConv3dLazyConvOptionsLazyConvTranspose1dLazyConvTranspose2dLazyConvTranspose3dLazyConvTransposeOptionsLazyInstanceNorm1dLazyInstanceNorm2dLazyInstanceNorm3dLazyLinearLeakyReLULeakyReLUOptionsLinearLinearOptionsload_state_dictLocalResponseNormLocalResponseNormOptionslog_probLogSigmoidLogSoftmaxLogSoftmaxOptionsLPPool1dLPPool1dOptionsLPPool2dLPPool2dOptionsLPPool3dLPPool3dOptionsLSTMLSTMCellLSTMCellOptionsMarginRankingLossMarginRankingLossOptionsmaterializematerializematerialize_uninitializedmaterialize_uninitializedMaxPool1dMaxPool1dOptionsMaxPool2dMaxPool2dOptionsMaxPool3dMaxPool3dOptionsMaxUnpool1dMaxUnpool1dOptionsMaxUnpool2dMaxUnpool2dOptionsMaxUnpool3dMaxUnpool3dOptionsMishModuleModuleBuffersModuleChildrenModuleDictModuleListModuleParametersModuleRegistrationHookmodulesMSELossMSELossOptionsmultihead_attnMultiheadAttentionMultiheadAttentionOptionsMultiLabelMarginLossMultiLabelMarginLossOptionsMultiLabelSoftMarginLossMultiMarginLossnamed_buffersnamed_childrennamed_modulesnamed_parametersNLLLossnum_parametersPairwiseDistancePairwiseDistanceOptionsParameterParameterDictParameterListParameterOptionsParameterRegistrationHookparametersPixelShufflePixelUnshufflePoissonNLLLosspopPReLUPReLUOptionsReflectionPad1dReflectionPad2dReflectionPad3dregister_backward_hookregister_bufferregister_forward_hookregister_forward_pre_hookregister_full_backward_hookregister_full_backward_pre_hookregister_module_backward_hookregister_module_buffer_registration_hookregister_module_forward_hookregister_module_forward_pre_hookregister_module_full_backward_hookregister_module_full_backward_pre_hookregister_module_module_registration_hookregister_module_parameter_registration_hookregister_parameterReLUReLU6RemovableHandleremoveReplicationPad1dReplicationPad2dReplicationPad3dRMSNormRNNRNNBaseRNNBaseOptionsRNNCellRNNCellOptionsRReLURReLUOptionsrunSELUSequentialsetsetsetSigmoidSiLUSmoothL1LossSmoothL1LossOptionsSoftMarginLossSoftMarginLossOptionsSoftmaxSoftmax2dSoftmaxOptionsSoftminSoftminOptionsSoftplusSoftplusOptionsSoftshrinkSoftshrinkOptionsSoftsignstate_dictSyncBatchNormTanhTanhshrinkThresholdThresholdOptionstotrainTransformerTransformerDecoderTransformerDecoderLayerTransformerDecoderLayerOptionsTransformerDecoderOptionsTransformerEncoderTransformerEncoderLayerTransformerEncoderLayerOptionsTransformerEncoderOptionsTransformerOptionsTripletMarginLossTripletMarginWithDistanceLossUnflattenUnfoldUnfoldOptionsUninitializedBufferUninitializedOptionsUninitializedParameterupdateUpsampleUpsamplingBilinear2dUpsamplingNearest2dvaluesvalueszero_gradZeroPad1dZeroPad2dZeroPad3d
absacosacoshaddaddbmmAddbmmOptionsaddcdivAddcdivOptionsaddcmulAddcmulOptionsaddmmAddmmOptionsaddmvAddmvOptionsaddrAddrOptionsadjointallallcloseAllcloseOptionsamaxaminaminmaxangleanyapplyOutarangeare_deterministic_algorithms_enabledargmaxargminargsortargwhereas_stridedas_tensorasinasinhAssertNoShapeErrorAtat_error_index_out_of_boundsatanatan2atanhatleast_1datleast_2datleast_3dAtShapeautocast_decrement_nestingautocast_increment_nestingAxesRecordbaddbmmBaddbmmOptionsbatch_dimensions_do_not_match_errorbernoulliBinaryOptionsbincountbitwise_andbitwise_left_shiftbitwise_notbitwise_orbitwise_right_shiftbitwise_xorblock_diagbmmbroadcast_error_incompatible_dimensionsbroadcast_shapesbroadcast_tensorsbroadcast_toBroadcastShapebroadcastShapesbucketizecanBroadcastTocartesian_prodcatCatShapecdistceilchain_matmulCholeskyShapechunkchunk_error_dim_out_of_rangeclampclear_autocast_cacheclonecolumn_stackcombinationscompiled_with_cxx11_abicomplexconjconj_physicalcontiguouscopysigncorrcoefcoscoshcount_nonzerocovCPUTensorDatacreateTorchCumExtremeResultcummaxcummincumprodCumShapecumsumcumulative_trapezoidCumulativeOptionsdeg2raddetachDetShapeDeviceDeviceInputDeviceTypediagdiag_embeddiagflatdiagonal_scatterDiagShapediffdigammadimension_error_out_of_rangedistdivdotdsplitdstackDTypeDynamicShapeEigShapeeinops_error_ambiguous_decompositioneinops_error_anonymous_in_outputeinops_error_dimension_mismatcheinops_error_invalid_patterneinops_error_reduce_undefined_outputeinops_error_repeat_missing_sizeeinops_error_undefined_axiseinsumeinsum_error_dimension_mismatcheinsum_error_index_out_of_rangeeinsum_error_invalid_equationeinsum_error_invalid_sublist_elementeinsum_error_operand_count_mismatcheinsum_error_subscript_rank_mismatcheinsum_error_unknown_output_indexEinsumOutputShapeEllipsiseluembedding_bag_error_requires_2d_inputemptyempty_cacheempty_likeeqequalerferfcerfinvexpexp2expandexpand_asexpand_error_incompatibleExpandShapeexpm1eyeEyeOptionsflattenFlattenShapeflipflip_error_dim_out_of_rangefliplrFlipShapeflipudfloat_powerfloorfloor_dividefmaxfminfmodfracfrexpfrombufferfullfull_likegathergather_error_dim_out_of_rangeGatherShapegcdgegeluget_autocast_cpu_dtypeget_autocast_gpu_dtypeget_autocast_ipu_dtypeget_autocast_xla_dtypeget_default_deviceget_default_dtypeget_deterministic_debug_modeget_device_moduleget_file_pathget_float32_matmul_precisionget_num_interop_threadsget_num_threadsget_printoptionsget_rng_stateGradFngthardsigmoidhardswishHasShapeErrorheavisidehistchistogramHistogramResulthsplithstackhypoti0imagindex_addindex_copyindex_fillindex_putindex_reduceindex_selectindex_select_error_dim_out_of_rangeIndexSelectShapeIndexSpecIndicesSpecinverseInverseShapeis_anomaly_check_nan_enabledis_anomaly_enabledis_autocast_cache_enabledis_autocast_cpu_enabledis_autocast_ipu_enabledis_autocast_xla_enabledis_complexis_complex_dtypeis_cpu_only_modeis_deterministic_algorithms_warn_only_enabledis_floating_pointis_floating_point_dtypeis_inference_mode_enabledis_nonzerois_tensoris_warn_always_enabledis_webgpu_availableIs2DIsAtLeast1DiscloseIscloseOptionsisfiniteisinisinfisnanisneginfisposinfisrealIsShapeErroritem_error_not_scalarItemResultkronkthvalueKthvalueOptionslcmldexpleleaky_relulerplgammalinalg_error_not_square_matrixlinalg_error_requires_2dlinalg_error_requires_at_least_2dlinspaceloglog10log1plog2logaddexplogaddexp2logcumsumexplogical_andlogical_notlogical_orlogical_xorlogitlogspacelogsumexpltLUShapemasked_selectmasked_select_asyncMaskSpecmatmulmatmul_error_inner_dimensions_do_not_matchMatmul2DShapeMatmulShapemaxmaximummeanmedianmemory_statsmemory_summarymeshgridminminimummmmodemovedimmsortmulmultinomialmultinomial_asyncmvnan_to_numnanmeannanmediannanquantilenansumnarrownarrow_copynarrow_error_length_exceeds_boundsnarrow_error_start_out_of_boundsNarrowShapeneneedsBroadcastnegNegativeDimnextafternonzeronormnormalNormOptionsnumelonesones_likeouterpackPackShapepermutepermute_error_dimension_count_mismatchPermuteShapepoissonpolarpositivepowPrintOptionsprodprofiler_allow_cudagraph_cupti_lazy_reinit_cuda12promote_typesquantileQuantileOptionsrad2degrandrand_likerandintrandint_likerandnrandn_likerandpermRangeSpecRankravelrealRearrangeShapereciprocalreduceReduceOperationReduceShapeReductionOptionsreluremainderrepeatrepeat_interleaveRepeatInterleaveOptionsRepeatShaperequireWebGPUreset_peak_memory_statsreshapeReshapeShaperesult_typerollrot90roundrsqrtscatterscatter_addscatter_add_scatter_error_dim_out_of_rangescatter_reducescatter_reduce_ScatterShapesearchsortedselectselect_error_index_out_of_boundsselect_scatterSelectShapeseluset_default_deviceset_default_tensor_typeset_deterministic_debug_modeset_float32_matmul_precisionset_printoptionsset_warn_alwaysShapeShapedTensorsigmoidsignsignbitsilusinsincsinhslice_error_out_of_boundsslice_scatterSliceShapeSliceSpecsoftmax_error_dim_out_of_rangeSoftmaxShapesoftplussoftsignsortSortOptionssplitsplit_error_dim_out_of_rangesqrtsquaresqueezeSqueezeShapestackstdstd_meanStdVarOptionssubSublistSublistElementSubscriptIndexsumSVDShapeswapaxessym_floatsym_intsym_notttaketake_along_dimtantanhtensortensor_splitTensorCreatorTensorDatatensordotTensorOptionsTensorStoragetileTileShapetopkTopkOptionsTorchtraceTraceShapetransposetranspose_dims_error_out_of_rangetranspose_error_requires_2d_tensorTransposeDimsShapeTransposeDimsShapeCheckedTransposeShapetrapezoidtriltril_indicestriutriu_indicestruncTypedArrayTypedStorageUnaryOptionsunbindunbind_error_dim_out_of_rangeunflattenuniqueunique_consecutiveunpackUnpackShapeunravel_indexunsqueezeUnsqueezeShapeuse_deterministic_algorithmsValidateBatchedSquareMatrixValidateChunkDimValidatedEinsumShapevalidateDeviceValidatedRearrangeShapeValidatedReduceShapeValidatedRepeatShapevalidateDTypeValidateEinsumValidateOperandCountValidateRanksValidateScalarValidateSplitDimValidateSquareMatrixValidateUnbindDimvar_var_meanvdotviewview_as_complexview_as_realvmapvsplitvstackWebGPUTensorDatawherexlogyzeroszeros_like
torch.js· 2026
LegalTerms of UsePrivacy Policy
/
/
  1. docs
  2. torch.js
  3. torch
  4. nn
  5. SoftmaxOptions

torch.nn.SoftmaxOptions

Softmax activation function.

Softmax is the standard output activation for multi-class classification. It converts raw logits (unconstrained outputs) into a probability distribution over K classes, where probabilities sum to 1. Each output is in (0, 1), interpretable as the predicted probability of that class. Softmax with cross-entropy loss is the de-facto standard for classification tasks in deep learning.

Core idea: Softmax(x_i) = exp(x_i) / Σ_j exp(x_j). The exponential amplifies relative differences between logits, making the largest logit have the highest probability. The division by the sum ensures outputs are normalized probabilities.

When to use Softmax:

  • Multi-class classification: Output layer for all classification tasks (image, NLP, etc.)
  • Standard practice: Always use softmax output + cross-entropy loss for classification
  • Not for hidden layers: Use ReLU/GELU for hidden layers, softmax only for output
  • Probability outputs: When you need normalized probabilities (not just class scores)
  • Categorical predictions: Any task with discrete, mutually-exclusive classes

Key properties:

  • Probabilistic output: Output sums to 1, all values in (0, 1), interpretable as probabilities
  • Differentiable: Smooth everywhere, enables gradient-based learning
  • Relative magnitudes matter: Only relative differences between logits matter, not absolute values
  • Temperature scaling: Softmax can be "heated" or "cooled" to control sharpness via division before exp
  • Dimension-specific: Applied along the class dimension (usually dim=-1), preserving batch structure

Algorithm: Forward: σ(x)_i = exp(x_i - max(x)) / Σ_j exp(x_j - max(x)) The max subtraction (log-sum-exp trick) prevents overflow in exp() for numerical stability. Backward: Jacobian is J_ij = σ(x)_i * (δ_ij - σ(x)_j) (more complex than simple element-wise ops)

Definition

export interface SoftmaxOptions {
  /** Dimension along which softmax is computed (default: -1) */
  dim?: number;
}
dim(number)optional
– Dimension along which softmax is computed (default: -1)

Examples

// Multi-class classification: ResNet + Softmax
class ResNetClassifier extends torch.nn.Module {
  private backbone: torch.nn.Module;  // Pretrained ResNet
  private fc: torch.nn.Linear;
  private softmax: torch.nn.Softmax;

  constructor() {
    super();
    // ... initialize backbone ...
    this.fc = new torch.nn.Linear(2048, 1000);  // 1000 classes (ImageNet)
    this.softmax = new torch.nn.Softmax(-1);     // Apply along class dimension
  }

  forward(x: torch.Tensor): torch.Tensor {
    x = this.backbone.forward(x);
    x = x.flatten(1);
    x = this.fc.forward(x);
    return this.softmax.forward(x);  // Probabilities: shape [batch, 1000]
  }
}
// Then use with CrossEntropyLoss
// NLP: Language model or text classifier
const batch_size = 32, seq_len = 128, vocab_size = 50000;
const logits = torch.randn([batch_size, seq_len, vocab_size]);

const softmax = new torch.nn.Softmax(-1);  // Softmax over vocabulary
const probs = softmax.forward(logits);  // Shape: [32, 128, 50000]

// Each position has probability distribution over vocabulary
// probs[i, j, :].sum() == 1 for all i, j
// Temperature scaling for calibration/confidence control
const logits = torch.randn([10, 5]);
const temperature = 2.0;  // Higher = softer (lower confidence), lower = sharper (higher confidence)

const softmax = new torch.nn.Softmax(-1);
const probs_hot = softmax.forward(logits.div(0.5));      // Sharp probabilities
const probs_cold = softmax.forward(logits.div(temperature)); // Soft probabilities
// Lower temperature → max prob closer to 1, others closer to 0 (high confidence)
// Higher temperature → more uniform distribution (low confidence)
Previous
Softmax2d
Next
Softmin