Skip to main content

torch.js has not been released yet.

Playground Contact

Introduction Type Safety Tensor Expressions Tensor Indexing Einsum Einops Autograd Training a Model Profiling & Memory PyTorch Migration Best Practices Runtimes Performance PyTorch Compatibility Benchmarks DType Coverage

adaptive_avg_pool1d adaptive_avg_pool2d adaptive_avg_pool3d adaptive_max_pool1d adaptive_max_pool1d_with_indices adaptive_max_pool2d adaptive_max_pool2d_with_indices adaptive_max_pool3d adaptive_max_pool3d_with_indices AdaptiveMaxPoolFunctionalOptions affine_grid AffineGridFunctionalOptions alpha_dropout AlphaDropoutFunctionalOptions avg_pool1d avg_pool2d avg_pool3d AvgPool1dFunctionalOptions AvgPool2dFunctionalOptions AvgPool3dFunctionalOptions batch_norm BatchNormFunctionalOptions binary_cross_entropy binary_cross_entropy_with_logits BinaryCrossEntropyFunctionalOptions BinaryCrossEntropyWithLogitsFunctionalOptions CeluFunctionalOptions channel_shuffle conv_transpose1d conv_transpose2d conv_transpose3d conv1d Conv1dFunctionalOptions conv2d Conv2dFunctionalOptions conv3d Conv3dFunctionalOptions ConvTranspose1dFunctionalOptions ConvTranspose2dFunctionalOptions ConvTranspose3dFunctionalOptions cosine_embedding_loss cosine_similarity CosineEmbeddingLossFunctionalOptions CosineSimilarityFunctionalOptions cross_entropy CrossEntropyFunctionalOptions ctc_loss CTCLossOptions dropout dropout1d dropout2d dropout3d DropoutFunctionalOptions EluFunctionalOptions embedding embedding_bag EmbeddingBagFunctionalOptions EmbeddingFunctionalOptions feature_alpha_dropout fold FoldFunctionalOptions fractional_max_pool2d fractional_max_pool2d_with_indices fractional_max_pool3d fractional_max_pool3d_with_indices FractionalMaxPoolFunctionalOptions gaussian_nll_loss GluFunctionalOptions grid_sample GridSampleFunctionalOptions group_norm grouped_mm GroupedMMFunctionalOptions GroupNormFunctionalOptions HardshrinkFunctionalOptions HardtanhFunctionalOptions hinge_embedding_loss HingeEmbeddingLossFunctionalOptions huber_loss HuberLossFunctionalOptions instance_norm InstanceNormFunctionalOptions interpolate InterpolateFunctionalOptions kl_div KlDivFunctionalOptions KLDivOptions l1_loss L1LossFunctionalOptions layer_norm LayerNormFunctionalOptions LeakyReluFunctionalOptions linear local_response_norm LocalResponseNormFunctionalOptions log_softmax lp_pool1d lp_pool2d lp_pool3d LPPoolFunctionalOptions margin_ranking_loss MarginRankingLossFunctionalOptions max_pool1d max_pool1d_with_indices max_pool2d max_pool2d_with_indices max_pool3d max_pool3d_with_indices max_unpool1d max_unpool2d max_unpool3d MaxPool1dFunctionalOptions MaxPool2dFunctionalOptions MaxPool3dFunctionalOptions MaxUnpoolFunctionalOptions mse_loss MseLossFunctionalOptions multi_head_attention_forward multi_margin_loss MultiHeadAttentionFunctionalOptions multilabel_margin_loss multilabel_soft_margin_loss nll_loss NllLossFunctionalOptions normalize NormalizeFunctionalOptions one_hot pad PadFunctionalOptions pairwise_distance PairwiseDistanceFunctionalOptions pdist PdistFunctionalOptions pixel_shuffle pixel_unshuffle poisson_nll_loss PoolWithIndicesResult ReluFunctionalOptions rms_norm RmsNormFunctionalOptions RreluFunctionalOptions scaled_grouped_mm scaled_mm ScaledDotProductAttentionFunctionalOptions ScaledGroupedMMFunctionalOptions ScaledMMFunctionalOptions smooth_l1_loss SmoothL1LossFunctionalOptions soft_margin_loss SoftMarginLossFunctionalOptions SoftmaxOptions SoftminFunctionalOptions SoftplusFunctionalOptions SoftshrinkFunctionalOptions triplet_margin_loss triplet_margin_with_distance_loss TripletMarginLossFunctionalOptions unfold UnfoldFunctionalOptions upsample upsample_bilinear upsample_nearest UpsampleBilinearOptions UpsampleNearestOptions UpsampleOptions

ActivationOptions AdaptiveAvgPool1d AdaptiveAvgPool2d AdaptiveAvgPool3d AdaptiveLogSoftmaxOptions AdaptiveLogSoftmaxWithLoss AdaptiveMaxPool1d AdaptiveMaxPool1dOptions AdaptiveMaxPool2d AdaptiveMaxPool2dOptions AdaptiveMaxPool3d AdaptiveMaxPool3dOptions add_module AlphaDropout append append apply AvgPool1d AvgPool1dOptions AvgPool2d AvgPool2dOptions AvgPool3d AvgPool3dOptions BackwardHook BackwardPreHook BatchNorm1d BatchNorm2d BatchNorm3d BatchNormOptions BCELoss BCEWithLogitsLoss Bilinear BilinearOptions Buffer BufferOptions BufferRegistrationHook buffers call CELU CELUOptions ChannelShuffle children CircularPad1d CircularPad2d CircularPad3d clear ConstantPad1d ConstantPad2d ConstantPad3d Conv1d Conv2d Conv3d ConvOptions ConvTranspose1d ConvTranspose2d ConvTranspose3d ConvTransposeOptions CosineEmbeddingLoss CosineEmbeddingLossOptions CosineSimilarity CosineSimilarityOptions create create CrossEntropyLoss CTCLoss decode decode delete Dropout Dropout1d Dropout2d Dropout3d DropoutOptions ELU ELUOptions Embedding EmbeddingBag EmbeddingBagForwardOptions EmbeddingBagFromPretrainedOptions EmbeddingBagOptions EmbeddingFromPretrainedOptions EmbeddingOptions encode encode entries entries eval extend FeatureAlphaDropout Flatten FlattenOptions Fold FoldOptions forward forward forward forward forward forward forward forward forward forward forward forward forward_with_target ForwardHook ForwardPreHook FractionalMaxPool2d FractionalMaxPool3d FractionalMaxPoolOptions from_pretrained from_pretrained GaussianNLLLoss GELU GELUOptions generate_square_subsequent_mask get get get get get get_buffer get_parameter get_submodule GLU GLUOptions GroupNorm GroupNormOptions GRU GRUCell Hardshrink HardshrinkOptions Hardsigmoid Hardswish Hardtanh HardtanhOptions has has HingeEmbeddingLoss HingeEmbeddingLossOptions HuberLoss HuberLossOptions Identity InstanceNorm1d InstanceNorm2d InstanceNorm3d InstanceNormOptions is_uninitialized_buffer is_uninitialized_parameter iterator]iterator]iterator]iterator]keys keys KLDivLoss L1Loss L1LossOptions LayerNorm LayerNormOptions LazyBatchNorm1d LazyBatchNorm2d LazyBatchNorm3d LazyConv1d LazyConv2d LazyConv3d LazyConvOptions LazyConvTranspose1d LazyConvTranspose2d LazyConvTranspose3d LazyConvTransposeOptions LazyInstanceNorm1d LazyInstanceNorm2d LazyInstanceNorm3d LazyLinear LeakyReLU LeakyReLUOptions Linear LinearOptions load_state_dict load_state_dict LocalResponseNorm LocalResponseNormOptions log_prob LogSigmoid LogSoftmax LogSoftmaxOptions LPPool1d LPPool1dOptions LPPool2d LPPool2dOptions LPPool3d LPPool3dOptions LSTM LSTMCell LSTMCellOptions MarginRankingLoss MarginRankingLossOptions materialize materialize materialize_uninitialized materialize_uninitialized MaxPool1d MaxPool1dOptions MaxPool2d MaxPool2dOptions MaxPool3d MaxPool3dOptions MaxUnpool1d MaxUnpool1dOptions MaxUnpool2d MaxUnpool2dOptions MaxUnpool3d MaxUnpool3dOptions Mish Module ModuleBuffers ModuleChildren ModuleDict ModuleDictOptions ModuleList ModuleListOptions ModuleParameters ModuleRegistrationHook modules MSELoss MSELossOptions multihead_attn MultiheadAttention MultiheadAttentionOptions MultiheadAttnOptions MultiLabelMarginLoss MultiLabelMarginLossOptions MultiLabelSoftMarginLoss MultiMarginLoss named_buffers named_children named_modules named_parameters NamedModulesOptions NamedRecurseOptions NLLLoss num_parameters NumParametersOptions PairwiseDistance PairwiseDistanceOptions Parameter ParameterDict ParameterDictOptions ParameterList ParameterListOptions ParameterOptions ParameterRegistrationHook parameters PixelShuffle PixelUnshuffle PoissonNLLLoss pop pop PReLU PReLUOptions RecurseOptions ReflectionPad1d ReflectionPad2d ReflectionPad3d register_backward_hook register_buffer register_forward_hook register_forward_pre_hook register_full_backward_hook register_full_backward_pre_hook register_module_backward_hook register_module_buffer_registration_hook register_module_forward_hook register_module_forward_pre_hook register_module_full_backward_hook register_module_full_backward_pre_hook register_module_module_registration_hook register_module_parameter_registration_hook register_parameter ReLU ReLU6 RemovableHandle remove ReplicationPad1d ReplicationPad2d ReplicationPad3d RMSNorm RMSNormOptions RNN RNNBase RNNBaseOptions RNNCell RNNCellOptions RReLU RReLUOptions run run SELU Sequential set set set Sigmoid SiLU SmoothL1Loss SmoothL1LossOptions SoftMarginLoss SoftMarginLossOptions Softmax Softmax2d SoftmaxOptions Softmin SoftminOptions Softplus SoftplusOptions Softshrink SoftshrinkOptions Softsign state_dict state_dict StateDictOptions step SyncBatchNorm Tanh Tanhshrink Threshold ThresholdOptions to to to train TrainOptions Transformer TransformerDecoder TransformerDecoderDecodeOptions TransformerDecoderLayer TransformerDecoderLayerDecodeOptions TransformerDecoderLayerOptions TransformerDecoderOptions TransformerEncoder TransformerEncoderEncodeOptions TransformerEncoderLayer TransformerEncoderLayerEncodeOptions TransformerEncoderLayerOptions TransformerEncoderOptions TransformerOptions TransformerRunOptions TripletMarginLoss TripletMarginWithDistanceLoss Unflatten Unfold UnfoldOptions UninitializedBuffer UninitializedOptions UninitializedParameter update Upsample UpsamplingBilinear2d UpsamplingNearest2d values values zero_grad ZeroPad1d ZeroPad2d ZeroPad3d

abs acos acosh AdaptivePool1dShape AdaptivePool2dShape add addbmm AddbmmOptions addcdiv AddcdivOptions addcmul AddcmulOptions addmm AddmmOptions addmv AddmvOptions addr AddrOptions adjoint all allclose AllcloseOptions AlphaBetaOptions amax amin aminmax AminmaxOptions angle any applyOut arange are_deterministic_algorithms_enabled argmax argmin argsort argwhere as_strided as_tensor asin asinh AssertNoShapeError AssertNotError AsStridedOptions At at_error_index_out_of_bounds atan atan2 atanh atleast_1d atleast_2d atleast_3d AtShape autocast_decrement_nesting autocast_increment_nesting autograd_gradient_mismatch_error autograd_not_registered_error AutogradConfig AutogradDevice AutogradDType AutogradEntry AutogradHandle AutogradHandleImpl AxesRecord BackwardFn baddbmm BaddbmmOptions bartlett_window BaseKernelConfig batch_dimensions_do_not_match_error bernoulli BernoulliOptions BinaryBackwardFn BinaryBroadcastResult BinaryDType BinaryKernelConfigCPU BinaryKernelCPU BinaryOpConfig BinaryOpNames BinaryOpSchema BinaryOptions bincount BincountOptions bitwise_and bitwise_left_shift bitwise_not bitwise_or bitwise_right_shift bitwise_xor blackman_window block_diag bmm BooleanDTypeRule broadcast_error_incompatible_dimensions broadcast_shapes broadcast_tensors broadcast_to BroadcastShape BroadcastShapeRule broadcastShapes bucketize BucketizeOptions BufferUsage buildEinopsError buildErrorMessage canBroadcastTo cartesian_prod cat CatOptions CatShape CauchyOptions cdist CdistOptions ceil celu CeluFunctionalOptions chain_matmul CheckShapeError CholeskyShape chunk chunk_error_dim_out_of_range ChunkOptions clamp ClampOptions clear_autocast_cache clearEinopsCache clearEinsumCache clone column_stack combinations CombinationsOptions compiled_with_cxx11_abi complex conj conj_physical contiguous Conv1dShape Conv2dShape Conv3dShape ConvTranspose2dShape copysign corrcoef cos cosh count_nonzero CountNonzeroOptions cov coverage_report coverageReport CoverageReport CovOptions CPUForwardFn CPUKernelConfig CPUKernelEntry CPUOnlyResult CPUTensorData createCumExtremeResult createTorch CreationOpSchema CumExtremeResult cummax cummin cumprod CumShape cumsum cumulative_trapezoid CumulativeOptions CumulativeOptionsWithDim deg2rad detach DeterministicOptions DetShape Device device_error_requires DeviceBuffer DeviceCapabilities DeviceCheckedResult DeviceConfig DeviceContext DeviceEntry DeviceHandle DeviceInput DeviceOptions DeviceRegistry DeviceType diag diag_embed DiagEmbedOptions diagflat DiagflatOptions DiagFlatOptions diagonal_scatter DiagonalOptions DiagonalScatterOptions DiagOptions DiagShape diff DiffOptions digamma dimension_error_out_of_range DispatchConfig dist DistOptions div dot DotShapeRule Double DoubleDim dropout DropoutFunctionalOptions dsplit dstack DType dtype_already_registered_error dtype_components_mismatch_error dtype_not_found_error DTypeComponents DTypeConfig DTypeCoverageReport DTypeDisplayConfig DTypeEntry DTypeHandle DTypeHandleImpl DTypeInfo DTypeRegistry DTypeRule DTypeSerializationConfig DynamicShape EigShape einops_error_ambiguous_decomposition einops_error_anonymous_in_output einops_error_dimension_mismatch einops_error_invalid_pattern einops_error_reduce_undefined_output einops_error_repeat_missing_size einops_error_undefined_axis einsum einsum_error_dimension_mismatch einsum_error_index_out_of_range einsum_error_invalid_equation einsum_error_invalid_sublist_element einsum_error_operand_count_mismatch einsum_error_subscript_rank_mismatch einsum_error_unknown_output_index EinsumOptions EinsumOutputShape Ellipsis elu elu_EluFunctionalOptions embedding_bag_error_requires_2d_input empty empty_cache empty_like eq equal erf erfc erfinv exp exp2 expand expand_as expand_error_incompatible ExpandShape expm1 ExponentialOptions eye EyeOptions fft FFTOptions findKernelWithPredicate findSimilarPatterns flatten FlattenOptions FlattenShape flip flip_error_dim_out_of_range fliplr FlipShape flipud float_power FloatDTypeRule floor floor_divide fmax fmin fmod formatEquationError formatShape frac frexp frombuffer full full_like function_already_registered_error FunctionConfig FunctionEntry FunctionHandle gather gather_error_dim_out_of_range GatherShape gcd ge gelu GeometricOptions get_autocast_cpu_dtype get_autocast_gpu_dtype get_autocast_ipu_dtype get_autocast_xla_dtype get_default_device get_default_dtype get_deterministic_debug_mode get_device_config get_device_context get_device_module get_dtype_info get_file_path get_float32_matmul_precision get_num_interop_threads get_num_threads get_op_info get_printoptions get_real_dtype get_rng_state getAutograd getDType getEinopsCacheSize getEinsumCacheSize getFunction getKernel getMethod getOpInfo GetOpKind GetOpSchema getScalarKernel glu GluFunctionalOptions GradContext GradFn GradientsFor gt Half HalfDim hamming_window hann_window hardshrink hardsigmoid hardswish hardtanh hardtanh_HardtanhFunctionalOptions has_autograd has_device has_dtype has_kernel hasAutograd hasDType hasFunction hasKernel hasMethod hasScalarKernel HasShapeError heaviside histc HistcOptions histogram HistogramOptions HistogramResult hsplit hstack hypot i0 IdentityShape ifft imag index_add index_copy index_fill index_put index_reduce index_select index_select_error_dim_out_of_range IndexPutOptions IndexSelectShape IndexSpec IndicesOptions IndicesSpec initialize_device InputsFor InsertDim invalid_config_error inverse InverseShape irfft is_anomaly_check_nan_enabled is_anomaly_enabled is_autocast_cache_enabled is_autocast_cpu_enabled is_autocast_ipu_enabled is_autocast_xla_enabled is_complex is_complex_dtype is_cpu_only_mode is_deterministic_algorithms_warn_only_enabled is_floating_point is_floating_point_dtype is_inference_mode_enabled is_nonzero is_tensor is_warn_always_enabled is_webgpu_available Is2D IsAtLeast1D IsBinaryOp IsBinaryOpName isclose IscloseOptions isfinite isin isinf isnan isneginf isposinf isreal IsReductionOp IsReductionOpName IsRegistryError IsShapeError istft ISTFTOptions IsUnaryOp IsUnaryOpName item_error_not_scalar ItemResult kaiser_window KaiserWindowOptions kernel_not_registered_error kernel_signature_mismatch_error KernelConfig KernelConfigWebGPU KernelEntry KernelHandle KernelInfo KernelPredicate KernelRegistry KernelWebGPU kron kthvalue KthvalueOptions lcm ldexp le leaky_relu leaky_relu_LeakyReluFunctionalOptions lerp levenshteinDistance lgamma linalg_error_not_square_matrix linalg_error_requires_2d linalg_error_requires_at_least_2d linear linspace list_custom_devices list_custom_dtypes list_devices list_dtypes list_functions list_kernels list_methods list_ops listCustomDTypes listDTypes listFunctions listKernels ListKernelsOptions listMethods listOps ListOpsOptions log log_softmax log10 log1p log2 logaddexp logaddexp2 logcumsumexp logical_and logical_not logical_or logical_xor LogitOptions LogNormalOptions LogOptions logsigmoid logspace logsumexp LogsumexpOptions lt LUShape LuSolveOptions masked_fill masked_select masked_select_async MaskSpec matmul matmul_error_inner_dimensions_do_not_match Matmul2DShape MatmulShape MatmulShapeRule MatrixTransposeShape max maximum mean median memory_stats memory_summary meshgrid method_already_registered_error method_dtype_not_supported_error MethodConfig MethodEntry MethodHandle min minimum mish mm MMShapeRule mode movedim msort mul multinomial multinomial_async MultinomialAsyncOptions MultinomialOptions MultiplyBy mv MVShapeRule nan_to_num nanmean nanmedian nanquantile NanReductionOptions nansum NanToNumOptions narrow narrow_copy narrow_error_length_exceeds_bounds narrow_error_start_out_of_bounds NarrowShape ne needsBroadcast neg NegativeDim nextafter nonzero NonzeroOptions norm normal NormalOptions NormOptions numel ones ones_like op_kind_mismatch_error op_not_found_error OpCoverageEntry OpInfo OpKind OpName OpSchema OpSchemas outer OuterShape pack PackShape permute permute_error_dimension_count_mismatch PermuteShape poisson polar Pool1dShape Pool2dShape Pool3dShape positive pow prelu PrintOptions prod profiler_allow_cudagraph_cupti_lazy_reinit_cuda12 promote_types PromoteDTypeRule PutOptions quantile QuantileOptions rad2deg rand rand_like randint randint_like RandintLikeOptions RandintOptions randn randn_like RandomLikeOptions RandomOptions randperm RangeSpec Rank ravel real rearrange RearrangeOptions RearrangeShape reciprocal reduce ReduceOperation ReduceOptions ReduceShape ReductionKernelConfigCPU ReductionKernelCPU ReductionOpNames ReductionOpSchema ReductionOptions ReductionShapeRule register_backward register_device register_dtype register_forward register_function register_method register_scalar_forward registerAutograd RegisterBackwardOptions registerBinaryOp registerDType RegisterDTypeOptions RegisteredDType registerFunction RegisterFunctionOptions registerKernel RegisterKernelOptions registerMethod RegisterMethodOptions registerScalarKernel registerUnaryOp registration_failed_error relu relu_relu6 ReluFunctionalOptions remainder RemoveDim repeat repeat_interleave RepeatInterleaveOptions RepeatOptions RepeatShape ReplaceDim requireWebGPU reset_peak_memory_stats reshape ReshapeShape result_type rfft roll RollOptions rot90 Rot90Options round rrelu rrelu_RreluFunctionalOptions rsqrt SafeExpandShape SameDTypeRule SameShapeRule SaveForBackward ScalarCPUForwardFn ScalarCPUKernelConfig ScalarKernelEntry ScalarKernelHandle ScalarWebGPUKernelConfig ScaleDim scatter scatter_add scatter_add_scatter_error_dim_out_of_range scatter_reduce scatter_reduce_ScatterReduceOptions ScatterShape searchsorted SearchSortedOptions select select_error_index_out_of_bounds select_scatter SelectShape selu set_default_device set_default_tensor_type set_deterministic_debug_mode set_float32_matmul_precision set_printoptions set_warn_always SetupContextFn Shape ShapeCheckedResult ShapedTensor ShapeErrorMessage ShapeOpSchema ShapeRule sigmoid sign signbit silu sin sinc sinh SizeOptions slice_error_out_of_bounds slice_scatter SliceOptions SliceScatterOptions SliceShape SliceSpec softmax softmax_error_dim_out_of_range SoftmaxShape softmin SoftminFunctionalOptions softplus SoftplusFunctionalOptions softshrink softsign sort SortOptions split split_error_dim_out_of_range SplitOptions sqrt square squeeze SqueezeOptions SqueezeShape stack StackOptions StackShape std std_mean StdVarMeanOptions StdVarOptions stft STFTOptions StrideOptions sub Sublist SublistElement SubscriptIndex sum SVDShape swapaxes sym_float sym_int sym_not t take take_along_dim TakeAlongDimOptions tan tanh tanhshrink tensor tensor_split TensorCreator TensorData tensordot TensordotOptions TensorLike TensorMeta TensorOptions TensorStorage threshold threshold_tile TileShape ToOptions topk TopkOptions Torch trace TraceShape transpose transpose_dims_error_out_of_range transpose_error_requires_2d_tensor TransposeDimsShape TransposeDimsShapeChecked TransposeShape trapezoid TrapezoidOptions TriangularOptions tril tril_indices TriOptions Triple triu triu_indices true_divide trunc TupleOfLength TypedArray TypedArrayFor TypedStorage TypeOptions UnaryBackwardFn UnaryDType UnaryKernelConfigCPU UnaryKernelCPU UnaryOpConfig UnaryOpFn UnaryOpNames UnaryOpParams UnaryOpSchema UnaryOptions unbind unbind_error_dim_out_of_range UnbindOptions unflatten UniformOptions unique unique_consecutive UniqueConsecutiveOptions UniqueOptions unpack UnpackShape unravel_index unregister_device unsqueeze UnsqueezeOptions UnsqueezeShape use_deterministic_algorithms ValidateBatchedSquareMatrix ValidateChunkDim ValidatedEinsumShape validateDevice ValidateDevice ValidatedRearrangeShape ValidatedReduceShape ValidatedRepeatShape validateDType ValidateEinsum ValidateOperandCount ValidateRanks ValidateScalar ValidateSplitDim ValidateSquareMatrix ValidateUnbindDim ValueOptions var_var_mean vdot view view_as_complex view_as_real vmap vsplit vstack WebGPUKernelConfig WebGPUOnlyResult WebGPUTensorData where WindowOptions xlogy zeros zeros_like

torch.js· 2026

Legal Terms of Use Privacy Policy

/

/

docs
torch.js
torch
nn
functional
multi_head_attention_forward

torch.nn.functional.multi_head_attention_forward

function multi_head_attention_forward(query: Tensor, key: Tensor, value: Tensor, embed_dim_to_check: number, num_heads: number, in_proj_weight: Tensor | null, in_proj_bias: Tensor | null, bias_k: Tensor | null, bias_v: Tensor | null, add_zero_attn: boolean, dropout_p: number, out_proj_weight: Tensor, out_proj_bias: Tensor | null, options?: MultiHeadAttentionFunctionalOptions): [Tensor, Tensor | null]

Multi-head attention forward pass.

This implements scaled dot-product attention with multiple heads: Attention(Q, K, V) = softmax(Q * K^T / sqrt(d_k)) * V

Parameters

queryTensor: Query tensor of shape (L, N, E) where L is target sequence length, N is batch size, E is embedding dimension
keyTensor: Key tensor of shape (S, N, E) where S is source sequence length
valueTensor: Value tensor of shape (S, N, E)
embed_dim_to_checknumber: Expected embedding dimension
num_headsnumber: Number of attention heads
in_proj_weightTensor | null: Projection weight for query, key, value (3*E, E)
in_proj_biasTensor | null: Projection bias for query, key, value (3*E)
bias_kTensor | null: Optional bias for key
bias_vTensor | null: Optional bias for value
add_zero_attnboolean: If true, add a zero attention weight
dropout_pnumber: Dropout probability
out_proj_weightTensor: Output projection weight (E, E)
out_proj_biasTensor | null: Output projection bias (E)
optionsMultiHeadAttentionFunctionalOptionsoptional

Returns

[Tensor, Tensor | null]– Tuple of (output, attention_weights) where output is (L, N, E)

See Also

PyTorch torch.nn.functional.multi_head_attention_forward

MseLossFunctionalOptions

multi_margin_loss