torch.nn.functional.mse_loss
function mse_loss(input: Tensor, target: Tensor): Tensorfunction mse_loss(input: Tensor, target: Tensor, size_average: boolean | null, reduce: boolean | null, reduction: 'none' | 'mean' | 'sum', options: MseLossFunctionalOptions): TensorMean Squared Error (MSE) Loss: standard regression loss function.
Computes the average squared difference between predictions and targets. The quadratic penalty heavily penalizes large errors, making MSE sensitive to outliers but enabling efficient optimization. Essential for:
- Regression tasks (predicting continuous values: prices, temperatures, distances)
- Reconstruction tasks (autoencoders, denoising)
- Pixel-level predictions (depth, segmentation, image-to-image)
- Bounding box regression in object detection
- Time series forecasting
- Loss function in most optimization problems due to mathematical properties
When to use MSE Loss:
- Standard choice for regression (default if unsure)
- When large errors should be heavily penalized
- When you want Gaussian noise assumption for targets
- For optimization convenience (smooth, well-behaved gradients)
- When outliers are rare and acceptable to penalize heavily
Trade-offs vs L1 Loss:
- Robustness: MSE penalizes outliers quadratically (sensitive), L1 linearly (robust)
- Smoothness: MSE smooth everywhere (better for optimization), L1 has kink at 0
- Gradient magnitude: MSE gradients grow with error (can explode), L1 constant
- Empirical: MSE usually better if outliers rare; L1/Huber better with outliers
- Computational: Both similar cost, MSE slightly faster
- Most common: Default choice for regression tasks
- Smooth optimization: Well-behaved gradients help convergence
- Outlier sensitive: Quadratic penalty heavily weights large errors
- Gaussian assumption: Assumes Gaussian noise on targets
- Scale dependent: MSE sensitive to magnitude of values (consider normalization)
- Outlier sensitivity: Large errors penalized quadratically; may cause numerical issues
- Scale sensitivity: Should normalize/standardize targets to same scale
Parameters
Returns
Tensor– Scalar loss value (or tensor if reduction='none')Examples
// Simple regression
const predictions = torch.tensor([1.0, 2.0, 3.0, 4.0]);
const targets = torch.tensor([1.1, 2.2, 2.8, 4.1]);
const loss = torch.nn.functional.mse_loss(predictions, targets);
// loss = mean([0.01, 0.04, 0.04, 0.01]) = 0.025// Neural network regression
const model = new torch.nn.Linear(10, 1);
const optimizer = new torch.optim.SGD(model.parameters(), 0.01);
for (let epoch = 0; epoch < 100; epoch++) {
const x = torch.randn([32, 10]);
const y = torch.randn([32, 1]);
const pred = model.forward(x);
const loss = torch.nn.functional.mse_loss(pred, y);
// loss.backward();
// optimizer.step();
}// Autoencoder reconstruction loss
const reconstructed = autoencoder.forward(x);
const reconstruction_loss = torch.nn.functional.mse_loss(reconstructed, x);See Also
- PyTorch torch.nn.functional.mse_loss
- l1_loss - Robust alternative with linear penalty
- smooth_l1_loss - Hybrid of L1 and MSE (best of both worlds)
- huber_loss - Alias for SmoothL1Loss