torch.nn.functional.grouped_mm
function grouped_mm(input_tensor_list: Tensor[], mat2_tensor_list: Tensor[], options?: GroupedMMFunctionalOptions): Tensor[]function grouped_mm(input: Tensor, mat2: Tensor, options?: GroupedMMFunctionalOptions): TensorPerforms grouped (multi-headed) matrix multiplication with optional bias and dtype casting.
Computes multiple independent matrix multiplications on lists of matrices in parallel. Each input matrix is multiplied with corresponding weight matrix independently, useful for multi-head attention, ensemble operations, and grouped computations. Essential for:
- Multi-head attention: Computing attention heads in parallel
- Grouped linear layers: Processing multiple feature groups independently
- Ensemble inference: Running multiple models/heads simultaneously
- Mixed-precision inference: Different dtypes for different heads
- Distributed computation: Processing groups separately then combining
- Efficient batch operations: More flexible than standard batched matmul
- Conditional computation: Different weights for different groups
Operation:
For each i: output[i] = input[i] @ weight[i] + bias[i] (optional)
All operations are independent and can be parallelized. Output dtype can be optionally cast to a different type for memory efficiency.
- List lengths must match: All three lists (input, weights, bias) must have same length
- Inner dimensions must match: input[i].shape[-1] must equal weights[i].shape[0]
- Optional bias: Biases can be null for some/all operations (sparse application)
- Independent operations: Each matmul is independent and can be parallelized
- Dtype flexibility: Can output different dtype than computation for efficiency
- Gradient propagation: Gradients flow back to all inputs, weights, and biases
- Multi-head friendly: Natural fit for multi-head attention architectures
- List length mismatch: Will error if input and weight lists differ in length
- Bias length mismatch: Bias list length must match input list if provided
- Dimension mismatch: Inner dimensions of input/weight must match for matmul
- dtype casting: Out_dtype casting happens after computation (precision loss possible)
Parameters
input_tensor_listTensor[]- List of n input tensors with shapes: - 1D: [k] → matmul with [k, m] → [m] - 2D: [p, k] → matmul with [k, m] → [p, m] - Higher: [*, k] → matmul with [k, m] → [*, m]
mat2_tensor_listTensor[]- List of n weight tensors with shapes: - [k, m] for corresponding input shape [*, k] → output [*, m] - Must be same length as input_tensor_list - Inner dimension k must match input's last dimension
optionsGroupedMMFunctionalOptionsoptional
Returns
Tensor[]– List of output tensors with shape [*, m] for corresponding inputsExamples
// Multi-head attention: process heads independently
const head_size = 64;
const num_heads = 8;
const seq_len = 10;
const batch_size = 32;
// Create inputs and weights for each head
const queries = []; // 8 heads, each [batch*seq, 64]
const key_weights = []; // 8 heads, each [64, 64]
for (let h = 0; h < num_heads; h++) {
queries.push(torch.randn(batch_size * seq_len, head_size));
key_weights.push(torch.randn(head_size, head_size));
}
const head_outputs = torch.nn.functional.grouped_mm(queries, key_weights);
// head_outputs[i]: [batch*seq, 64] for each head// Grouped linear layer: different weights for different feature groups
const input = [
torch.randn(100, 50), // Group 1: 100 samples, 50 features
torch.randn(100, 75), // Group 2: 100 samples, 75 features
torch.randn(100, 30) // Group 3: 100 samples, 30 features
];
const weights = [
torch.randn(50, 64), // Project group 1 to 64 dims
torch.randn(75, 128), // Project group 2 to 128 dims
torch.randn(30, 32) // Project group 3 to 32 dims
];
const biases = [
torch.randn(64), // Bias for group 1
torch.randn(128), // Bias for group 2
torch.randn(32) // Bias for group 3
];
const outputs = torch.nn.functional.grouped_mm(input, weights, biases);
// outputs[i]: [100, output_dim_i] for each group// Ensemble inference: different models/weights for same inputs
const input_batch = torch.randn(batch_size, feature_dim);
const model_weights = [
torch.randn(feature_dim, output_dim), // Model 1 weights
torch.randn(feature_dim, output_dim), // Model 2 weights
torch.randn(feature_dim, output_dim) // Model 3 weights
];
// Duplicate input for each model (or use broadcasting)
const inputs = [input_batch, input_batch, input_batch];
const predictions = torch.nn.functional.grouped_mm(inputs, model_weights);
// predictions: [model_output1, model_output2, model_output3]
const ensemble_output = torch.stack(predictions, 0).mean(0);
// Average ensemble predictions// Mixed-precision: compute in float32, output int8
const inputs = [torch.randn(100, 64), torch.randn(100, 64)];
const weights = [torch.randn(64, 32), torch.randn(64, 32)];
const outputs = torch.nn.functional.grouped_mm(
inputs,
weights,
null, // No bias
'int8' // Cast output to int8 for storage efficiency
);
// outputs: [int8 tensors] - reduced memory footprint// Grouped convolution: treat as grouped linear for channel operations
const group_size = 32;
const num_groups = 4;
const groups = [];
const group_weights = [];
for (let g = 0; g < num_groups; g++) {
groups.push(torch.randn(batch_size, group_size));
group_weights.push(torch.randn(group_size, output_per_group));
}
const group_outputs = torch.nn.functional.grouped_mm(groups, group_weights);
const output = torch.cat(group_outputs, 1); // Concatenate groups
// Grouped operation is more efficient than single large matmulSee Also
- [PyTorch torch._int_mm (internal) / torch.nn.functional.grouped_mm](https://pytorch.org/docs/stable/generated/torch._int_mm (internal) / torch.nn.functional.grouped_mm.html)
- scaled_grouped_mm - Batched grouped matmul with scaling
- scaled_mm - Scaled matrix multiplication with advanced options
- cat - Concatenate group outputs
- stack - Stack group outputs
- Tensor.matmul - Single matrix multiplication