# Negative Binomial Model Tutorial

The Negative Binomial model handles overdispersed count data.

## When to Use

Use the Negative Binomial model when:
- Outcome is a non-negative integer (0, 1, 2, ...)
- Variance exceeds the mean (overdispersion)
- Poisson model underfits the variance
- Examples: doctor visits with heavy users, insurance claims

## Mathematical Setup

### Data Generating Process

$$Y \sim \text{NegBin}(\mu, r)$$

Where:
$$\mu = \exp(\alpha(X) + \beta(X) \cdot T)$$

The variance is $\text{Var}(Y) = \mu + \alpha \mu^2$ where $\alpha$ is the overdispersion parameter.

### Estimand

$$\mu^* = E[\beta(X)]$$

The average treatment effect on the log-rate (same as Poisson).

### Loss Function

$$L(Y, T, \theta) = \mu - Y \log \mu$$

Modified Poisson-like loss accounting for overdispersion.

### Influence Score Components

| Component | Formula |
|-----------|---------|
| Residual $r$ | $(Y - \mu) / (1 + \alpha\mu)$ |
| Hessian weight $W$ | $\mu / (1 + \alpha\mu)$ |
| Score $\nabla\ell$ | $-r \cdot [1, \tilde{T}]$ |

The overdispersion $\alpha$ downweights high-mean observations.

## Complete Example

```python
import numpy as np
from deep_inference import structural_dml

# Generate overdispersed count data
np.random.seed(42)
n = 2000
X = np.random.randn(n, 10)
T = np.random.randn(n)

# True structural functions
alpha_true = 1.0 + 0.2 * X[:, 0]
beta_true = 0.3 + 0.1 * X[:, 0]
mu = np.exp(alpha_true + beta_true * T)

# Add overdispersion via gamma-Poisson mixture
r = 2.0  # dispersion parameter
p = r / (r + mu)
Y = np.random.negative_binomial(r, p).astype(float)
mu_true = beta_true.mean()

print(f"True mu* = {mu_true:.6f}")
print(f"Mean count = {Y.mean():.2f}")
print(f"Variance = {Y.var():.2f}")
print(f"Variance/Mean ratio = {Y.var()/Y.mean():.2f}")

# Run inference
result = structural_dml(
    Y=Y, T=T, X=X,
    family='negbin',
    hidden_dims=[64, 32],
    epochs=100,
    n_folds=50,
    lr=0.01
)

print(result.summary())
```

## Poisson vs Negative Binomial

### When to Use Each

| Condition | Model |
|-----------|-------|
| Var(Y) $\approx$ Mean(Y) | Poisson |
| Var(Y) > Mean(Y) | Negative Binomial |
| Var(Y) < Mean(Y) | Underdispersion (rare) |

### Diagnostic Check

```python
# Simple overdispersion test
mean_y = Y.mean()
var_y = Y.var()
dispersion_ratio = var_y / mean_y

print(f"Mean: {mean_y:.2f}")
print(f"Variance: {var_y:.2f}")
print(f"Dispersion ratio: {dispersion_ratio:.2f}")

if dispersion_ratio > 1.5:
    print("Overdispersion detected -> use NegBin")
    result = structural_dml(Y, T, X, family='negbin')
elif dispersion_ratio < 0.8:
    print("Underdispersion detected -> consider alternatives")
else:
    print("Approximately equidispersed -> Poisson OK")
    result = structural_dml(Y, T, X, family='poisson')
```

## Real-World Applications

### Healthcare Utilization

```python
# Y = number of doctor visits
# T = insurance status
# X = (age, chronic conditions, income, ...)
# Target: E[beta(X)] = average insurance effect on utilization

# Why NegBin: Some patients are heavy users (many visits),
# creating overdispersion in visit counts
result = structural_dml(Y, T, X, family='negbin')
```

### Insurance Claims

```python
# Y = number of claims filed
# T = deductible amount
# X = (policy type, customer age, history, ...)
# Target: E[beta(X)] = average deductible effect on claim frequency

# Why NegBin: Claim counts often show clustering
# (some customers file many claims, most file few)
result = structural_dml(Y, T, X, family='negbin')
```

### Species Counts

```python
# Y = number of species observed
# T = habitat protection level
# X = (area size, climate, elevation, ...)
# Target: E[beta(X)] = average protection effect on biodiversity

# Why NegBin: Ecological counts are typically overdispersed
result = structural_dml(Y, T, X, family='negbin')
```

## Overdispersion Parameter

The overdispersion parameter $\alpha$ controls how much extra variance exists:

- $\alpha = 0$: Reduces to Poisson
- $\alpha = 0.5$: Moderate overdispersion
- $\alpha = 1.0$: Strong overdispersion

### Effect on Inference

Higher overdispersion means:
- Less information per observation
- Wider confidence intervals
- Weight function $W = \mu/(1 + \alpha\mu)$ approaches $1/\alpha$ for large $\mu$

## Key Takeaways

1. **Check dispersion first**: Plot variance vs mean before choosing model
2. **Overdispersion is common**: Real count data usually shows Var > Mean
3. **Same interpretation as Poisson**: Log-link means semi-elasticity
4. **Weight downweighting**: High-count observations get relatively less weight than in Poisson
5. **Robust to misspecification**: NegBin is safer default than Poisson for count data