torch.nn.functional.grouped_mm

function grouped_mm(input_tensor_list: Tensor[], mat2_tensor_list: Tensor[], options?: GroupedMMFunctionalOptions): Tensor[]

function grouped_mm(input: Tensor, mat2: Tensor, options?: GroupedMMFunctionalOptions): Tensor

Performs grouped (multi-headed) matrix multiplication with optional bias and dtype casting.

Computes multiple independent matrix multiplications on lists of matrices in parallel. Each input matrix is multiplied with corresponding weight matrix independently, useful for multi-head attention, ensemble operations, and grouped computations. Essential for:

Multi-head attention: Computing attention heads in parallel
Grouped linear layers: Processing multiple feature groups independently
Ensemble inference: Running multiple models/heads simultaneously
Mixed-precision inference: Different dtypes for different heads
Distributed computation: Processing groups separately then combining
Efficient batch operations: More flexible than standard batched matmul
Conditional computation: Different weights for different groups

Operation: For each i: output[i] = input[i] @ weight[i] + bias[i] (optional)

All operations are independent and can be parallelized. Output dtype can be optionally cast to a different type for memory efficiency.

List lengths must match: All three lists (input, weights, bias) must have same length
Inner dimensions must match: input[i].shape[-1] must equal weights[i].shape[0]
Optional bias: Biases can be null for some/all operations (sparse application)
Independent operations: Each matmul is independent and can be parallelized
Dtype flexibility: Can output different dtype than computation for efficiency
Gradient propagation: Gradients flow back to all inputs, weights, and biases
Multi-head friendly: Natural fit for multi-head attention architectures

List length mismatch: Will error if input and weight lists differ in length
Bias length mismatch: Bias list length must match input list if provided
Dimension mismatch: Inner dimensions of input/weight must match for matmul
dtype casting: Out_dtype casting happens after computation (precision loss possible)

Parameters

input_tensor_listTensor[]: List of n input tensors with shapes: - 1D: [k] → matmul with [k, m] → [m] - 2D: [p, k] → matmul with [k, m] → [p, m] - Higher: [*, k] → matmul with [k, m] → [*, m]
mat2_tensor_listTensor[]: List of n weight tensors with shapes: - [k, m] for corresponding input shape [*, k] → output [*, m] - Must be same length as input_tensor_list - Inner dimension k must match input's last dimension
optionsGroupedMMFunctionalOptionsoptional

Returns

Tensor[]– List of output tensors with shape [*, m] for corresponding inputs

Examples

// Multi-head attention: process heads independently
const head_size = 64;
const num_heads = 8;
const seq_len = 10;
const batch_size = 32;

// Create inputs and weights for each head
const queries = [];    // 8 heads, each [batch*seq, 64]
const key_weights = []; // 8 heads, each [64, 64]
for (let h = 0; h < num_heads; h++) {
  queries.push(torch.randn(batch_size * seq_len, head_size));
  key_weights.push(torch.randn(head_size, head_size));
}

const head_outputs = torch.nn.functional.grouped_mm(queries, key_weights);
// head_outputs[i]: [batch*seq, 64] for each head

// Grouped linear layer: different weights for different feature groups
const input = [
  torch.randn(100, 50),  // Group 1: 100 samples, 50 features
  torch.randn(100, 75),  // Group 2: 100 samples, 75 features
  torch.randn(100, 30)   // Group 3: 100 samples, 30 features
];

const weights = [
  torch.randn(50, 64),   // Project group 1 to 64 dims
  torch.randn(75, 128),  // Project group 2 to 128 dims
  torch.randn(30, 32)    // Project group 3 to 32 dims
];

const biases = [
  torch.randn(64),   // Bias for group 1
  torch.randn(128),  // Bias for group 2
  torch.randn(32)    // Bias for group 3
];

const outputs = torch.nn.functional.grouped_mm(input, weights, biases);
// outputs[i]: [100, output_dim_i] for each group

// Ensemble inference: different models/weights for same inputs
const input_batch = torch.randn(batch_size, feature_dim);
const model_weights = [
  torch.randn(feature_dim, output_dim),  // Model 1 weights
  torch.randn(feature_dim, output_dim),  // Model 2 weights
  torch.randn(feature_dim, output_dim)   // Model 3 weights
];

// Duplicate input for each model (or use broadcasting)
const inputs = [input_batch, input_batch, input_batch];

const predictions = torch.nn.functional.grouped_mm(inputs, model_weights);
// predictions: [model_output1, model_output2, model_output3]
const ensemble_output = torch.stack(predictions, 0).mean(0);
// Average ensemble predictions

// Mixed-precision: compute in float32, output int8
const inputs = [torch.randn(100, 64), torch.randn(100, 64)];
const weights = [torch.randn(64, 32), torch.randn(64, 32)];

const outputs = torch.nn.functional.grouped_mm(
  inputs,
  weights,
  null,      // No bias
  'int8'     // Cast output to int8 for storage efficiency
);
// outputs: [int8 tensors] - reduced memory footprint

// Grouped convolution: treat as grouped linear for channel operations
const group_size = 32;
const num_groups = 4;
const groups = [];
const group_weights = [];

for (let g = 0; g < num_groups; g++) {
  groups.push(torch.randn(batch_size, group_size));
  group_weights.push(torch.randn(group_size, output_per_group));
}

const group_outputs = torch.nn.functional.grouped_mm(groups, group_weights);
const output = torch.cat(group_outputs, 1);  // Concatenate groups
// Grouped operation is more efficient than single large matmul

torch.nn.functional.grouped_mm

Parameters

Returns

Examples

See Also

torch.nn.functional.grouped_mm

Parameters

Returns

Examples

See Also