Gamma Model Tutorial#

The Gamma model is for continuous positive outcomes with right-skewed distributions.

When to Use#

Use the Gamma model when:

  • Outcome is continuous and strictly positive

  • Data is right-skewed (long right tail)

  • Variance increases with the mean

  • Examples: insurance claims, healthcare costs, time durations, income

Mathematical Setup#

Data Generating Process#

\[Y \sim \text{Gamma}(k, \mu/k)\]

where: $\(\mu = \exp(\alpha(X) + \beta(X) \cdot T)\)$

And:

  • \(k\) is the shape parameter (controls variance)

  • \(\mu\) is the mean: \(E[Y] = \mu\)

  • \(\text{Var}[Y] = \mu^2 / k\)

Estimand#

\[\mu^* = E[\beta(X)]\]

The average effect on the log-mean across the covariate distribution.

Loss Function#

\[L(Y, T, \theta) = Y/\mu + \log(\mu)\]

Gamma deviance loss (up to constants).

Influence Score Components#

Component

Formula

Residual

\(r = 1 - Y/\mu\)

Hessian weight \(W\)

\(Y/\mu\)

Score \(\nabla\ell\)

\(r \cdot [1, T]\)

Note: The Hessian depends on \(\theta\) through \(\mu = \exp(\alpha + \beta T)\).

Complete Example#

import numpy as np
from deep_inference import structural_dml

# Generate synthetic data
np.random.seed(42)
n = 2000
X = np.random.randn(n, 10)
T = np.random.randn(n)

# True parameters
alpha_true = 2.0 + 0.3 * X[:, 0]
beta_true = 0.5 + 0.2 * X[:, 0]  # Heterogeneous effect
mu_true = beta_true.mean()

# Generate Gamma outcomes
mu = np.exp(alpha_true + beta_true * T)
shape = 2.0
Y = np.random.gamma(shape, mu / shape, size=n)

print(f"True mu* = {mu_true:.6f}")

# Run inference
result = structural_dml(
    Y=Y, T=T, X=X,
    family='gamma',
    hidden_dims=[64, 32],
    epochs=100,
    n_folds=50,
    lr=0.01
)

print(result.summary())

Expected Results#

From Eval 01: Parameter Recovery:

Family

Corr(α)

Corr(β)

Status

gamma

0.993

0.990

PASS

The influence function correction produces valid confidence intervals. See Validation for full results.

Real-World Applications#

Healthcare Costs#

Estimate the effect of a treatment on medical expenditure:

# Y = medical costs ($)
# T = treatment indicator
# X = (age, comorbidities, insurance type, ...)
# Target: E[beta(X)] = average effect on log-cost

result = structural_dml(Y, T, X, family='gamma')

Insurance Claims#

Estimate how policy features affect claim amounts:

# Y = claim amount
# T = deductible level
# X = (policyholder demographics, ...)
# Target: E[beta(X)] = average elasticity

result = structural_dml(Y, T, X, family='gamma')

Key Takeaways#

  1. Right-skewed positive data: Gamma is ideal when outcomes are strictly positive and variance increases with mean

  2. Hessian depends on theta: Requires three-way splitting (automatic in structural_dml)

  3. Log-link interpretation: \(\beta\) represents effect on log-mean, so \(\exp(\beta)\) is multiplicative effect

  4. Shape parameter: Higher shape = lower variance relative to mean