Negative Binomial Model Tutorial#
The Negative Binomial model handles overdispersed count data.
When to Use#
Use the Negative Binomial model when:
Outcome is a non-negative integer (0, 1, 2, …)
Variance exceeds the mean (overdispersion)
Poisson model underfits the variance
Examples: doctor visits with heavy users, insurance claims
Mathematical Setup#
Data Generating Process#
Where: $\(\mu = \exp(\alpha(X) + \beta(X) \cdot T)\)$
The variance is \(\text{Var}(Y) = \mu + \alpha \mu^2\) where \(\alpha\) is the overdispersion parameter.
Estimand#
The average treatment effect on the log-rate (same as Poisson).
Loss Function#
Modified Poisson-like loss accounting for overdispersion.
Influence Score Components#
Component |
Formula |
|---|---|
Residual \(r\) |
\((Y - \mu) / (1 + \alpha\mu)\) |
Hessian weight \(W\) |
\(\mu / (1 + \alpha\mu)\) |
Score \(\nabla\ell\) |
\(-r \cdot [1, \tilde{T}]\) |
The overdispersion \(\alpha\) downweights high-mean observations.
Complete Example#
import numpy as np
from deep_inference import structural_dml
# Generate overdispersed count data
np.random.seed(42)
n = 2000
X = np.random.randn(n, 10)
T = np.random.randn(n)
# True structural functions
alpha_true = 1.0 + 0.2 * X[:, 0]
beta_true = 0.3 + 0.1 * X[:, 0]
mu = np.exp(alpha_true + beta_true * T)
# Add overdispersion via gamma-Poisson mixture
r = 2.0 # dispersion parameter
p = r / (r + mu)
Y = np.random.negative_binomial(r, p).astype(float)
mu_true = beta_true.mean()
print(f"True mu* = {mu_true:.6f}")
print(f"Mean count = {Y.mean():.2f}")
print(f"Variance = {Y.var():.2f}")
print(f"Variance/Mean ratio = {Y.var()/Y.mean():.2f}")
# Run inference
result = structural_dml(
Y=Y, T=T, X=X,
family='negbin',
hidden_dims=[64, 32],
epochs=100,
n_folds=50,
lr=0.01
)
print(result.summary())
Poisson vs Negative Binomial#
When to Use Each#
Condition |
Model |
|---|---|
Var(Y) \(\approx\) Mean(Y) |
Poisson |
Var(Y) > Mean(Y) |
Negative Binomial |
Var(Y) < Mean(Y) |
Underdispersion (rare) |
Diagnostic Check#
# Simple overdispersion test
mean_y = Y.mean()
var_y = Y.var()
dispersion_ratio = var_y / mean_y
print(f"Mean: {mean_y:.2f}")
print(f"Variance: {var_y:.2f}")
print(f"Dispersion ratio: {dispersion_ratio:.2f}")
if dispersion_ratio > 1.5:
print("Overdispersion detected -> use NegBin")
result = structural_dml(Y, T, X, family='negbin')
elif dispersion_ratio < 0.8:
print("Underdispersion detected -> consider alternatives")
else:
print("Approximately equidispersed -> Poisson OK")
result = structural_dml(Y, T, X, family='poisson')
Real-World Applications#
Healthcare Utilization#
# Y = number of doctor visits
# T = insurance status
# X = (age, chronic conditions, income, ...)
# Target: E[beta(X)] = average insurance effect on utilization
# Why NegBin: Some patients are heavy users (many visits),
# creating overdispersion in visit counts
result = structural_dml(Y, T, X, family='negbin')
Insurance Claims#
# Y = number of claims filed
# T = deductible amount
# X = (policy type, customer age, history, ...)
# Target: E[beta(X)] = average deductible effect on claim frequency
# Why NegBin: Claim counts often show clustering
# (some customers file many claims, most file few)
result = structural_dml(Y, T, X, family='negbin')
Species Counts#
# Y = number of species observed
# T = habitat protection level
# X = (area size, climate, elevation, ...)
# Target: E[beta(X)] = average protection effect on biodiversity
# Why NegBin: Ecological counts are typically overdispersed
result = structural_dml(Y, T, X, family='negbin')
Overdispersion Parameter#
The overdispersion parameter \(\alpha\) controls how much extra variance exists:
\(\alpha = 0\): Reduces to Poisson
\(\alpha = 0.5\): Moderate overdispersion
\(\alpha = 1.0\): Strong overdispersion
Effect on Inference#
Higher overdispersion means:
Less information per observation
Wider confidence intervals
Weight function \(W = \mu/(1 + \alpha\mu)\) approaches \(1/\alpha\) for large \(\mu\)
Key Takeaways#
Check dispersion first: Plot variance vs mean before choosing model
Overdispersion is common: Real count data usually shows Var > Mean
Same interpretation as Poisson: Log-link means semi-elasticity
Weight downweighting: High-count observations get relatively less weight than in Poisson
Robust to misspecification: NegBin is safer default than Poisson for count data