torch.distributions.InverseGamma

class InverseGamma extends Distribution

new InverseGamma(concentration: number | Tensor, rate: number | Tensor, options?: DistributionOptions)

readonlyconcentration(Tensor): – Concentration parameter (alpha, shape parameter).
readonlyrate(Tensor): – Rate parameter (beta, inverse scale).
readonlyarg_constraints(unknown)
readonlysupport(unknown)
readonlyhas_rsample(unknown)
readonlymean(Tensor)
readonlymode(Tensor)
readonlyvariance(Tensor)

Inverse Gamma distribution: conjugate prior for variance in normal models.

Parameterized by concentration α and rate β. If X ~ Gamma(α, β), then 1/X ~ InverseGamma(α, β). Support is (0, ∞). Critical for Bayesian statistics because it's the conjugate prior for the variance of normal distributions - posterior is also InverseGamma with updated parameters. Essential for:

Bayesian inference for variance and precision parameters (conjugate prior)
Prior for normal distribution variance (standard choice in Bayesian regression)
Prior for exponential distribution scale parameters
Hierarchical Bayesian models with variance components
Modeling reciprocals of positive quantities
Empirical Bayes and hyperparameter estimation
Mixed-effects models with random effect variances

Conjugate Prior Property: If data X₁,...,Xₙ ~ N(μ, σ²) with known μ, and prior σ² ~ InverseGamma(α, β), then posterior σ² | data ~ InverseGamma(α + n/2, β + SS/2) where SS is sum of squared deviations. The parameters update in closed form (Bayesian advantage).

Relationship to Gamma: InverseGamma(α, β) = 1/Gamma(α, β). This reciprocal relationship means sampling is done by sampling Gamma and inverting.

\begin{aligned} \text{PDF: } f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{-\alpha-1} \exp\left(-\frac{\beta}{x}\right) \quad x > 0 \\ \text{Mean: } \mathbb{E}[X] = \frac{\beta}{\alpha - 1} \quad \alpha > 1 \quad \text{(undefined if } \alpha \leq 1\text{)} \\ \text{Variance: } \text{Var}(X) = \frac{\beta^2}{(\alpha-1)^2(\alpha-2)} \quad \alpha > 2 \\ \text{Mode: } \frac{\beta}{\alpha + 1} \\ \text{Relationship: } \text{InverseGamma}(\alpha, \beta) = \frac{1}{\text{Gamma}(\alpha, \beta)} \end{aligned}

Conjugate for normal variance: Posterior also InverseGamma (closed-form update)
Mean undefined if α ≤ 1: Mean only exists for α 1
Variance undefined if α ≤ 2: Variance only exists for α 2
Mode decreases with α: Mode = β/(α+1), lower α gives higher mode
Reciprocal of Gamma: Sampling via Gamma reciprocal
Right-skewed: Longer right tail (heavier for small α)
Heavy tails: Assign probability to very large values (useful as prior for variance)

α, β must be positive: α ≤ 0 or β ≤ 0 causes errors
Mean undefined for α ≤ 1: Only exists when α 1
Variance undefined for α ≤ 2: Variance doesn't exist unless α 2
Weakly-informative: Very small α (e.g., 0.001) gives minimal constraints

Examples

// Simple inverse gamma: α=2, β=1
const ig = new torch.distributions.InverseGamma(2, 1);
const samples = ig.sample([1000]);
const mean = ig.mean;  // 1.0
const variance = ig.variance;  // 1.0

// Bayesian inference: conjugate prior for normal variance
// Prior: σ² ~ InverseGamma(α₀, β₀)
const alpha0 = 2;
const beta0 = 1;
const prior = new torch.distributions.InverseGamma(alpha0, beta0);

// Observe data and update posterior
const n = 50;  // sample size
const ss = 25;  // sum of squared deviations
const alpha_post = alpha0 + n / 2;  // α₀ + n/2
const beta_post = beta0 + ss / 2;   // β₀ + SS/2
const posterior = new torch.distributions.InverseGamma(alpha_post, beta_post);
// Posterior automatically updated with closed form!\n *
// Prior for regression variance in linear models
// Standard weakly-informative prior: InverseGamma(0.001, 0.001)
const weak_prior = new torch.distributions.InverseGamma(0.001, 0.001);
const weak_samples = weak_prior.sample([100]);
// Very dispersed prior, lets data dominate\n *
// Batched distributions with different parameters
const alphas = torch.tensor([0.5, 1.0, 2.0, 5.0]);
const betas = torch.tensor([1.0, 1.0, 1.0, 1.0]);
const dist = new torch.distributions.InverseGamma(alphas, betas);
const samples = dist.sample();  // [4] shaped samples
// Smaller α → heavier right tail
n *
// Comparing tail behavior with different concentration parameters
const light = new torch.distributions.InverseGamma(5, 1);  // light tail
const heavy = new torch.distributions.InverseGamma(0.5, 1);  // heavy tail
const x = torch.tensor([10]);
const light_prob = light.log_prob(x);  // exp(-100) region
const heavy_prob = heavy.log_prob(x);  // much larger probability

// Posterior for hierarchical variance component
// Multi-level model: within-group variance hierarchical prior
const hier_prior = new torch.distributions.InverseGamma(3, 2);
const hier_var = hier_prior.sample([10]);  // 10 group-level variances