# Metrics Module

Helper functions for computing inference quality metrics.

## Main Functions

### compute_coverage

```python
from deep_inference import compute_coverage

# Check if true value falls within CI
covered = compute_coverage(mu_true, ci_lower, ci_upper)
```

### compute_se_ratio

```python
from deep_inference import compute_se_ratio

# Compare estimated SE to empirical SE
se_ratio = compute_se_ratio(estimated_se, empirical_se)
```

## Usage Example

```python
from deep_inference import structural_dml
import numpy as np

# Run multiple simulations
results = []
for seed in range(100):
    np.random.seed(seed)
    # Generate data...
    result = structural_dml(Y, T, X, family='linear')
    results.append({
        'mu_hat': result.mu_hat,
        'se': result.se,
        'ci_lower': result.ci_lower,
        'ci_upper': result.ci_upper
    })

# Compute metrics
mu_true = 0.5  # known ground truth
mu_hats = [r['mu_hat'] for r in results]
ses = [r['se'] for r in results]

# Coverage
covered = [(r['ci_lower'] <= mu_true <= r['ci_upper']) for r in results]
coverage = np.mean(covered)
print(f"Coverage: {coverage:.1%}")  # Target: 95%

# SE ratio
empirical_se = np.std(mu_hats)
mean_estimated_se = np.mean(ses)
se_ratio = mean_estimated_se / empirical_se
print(f"SE Ratio: {se_ratio:.2f}")  # Target: 1.0
```

## Key Metrics

| Metric | Formula | Target |
|--------|---------|--------|
| `bias` | $E[\hat\mu] - \mu^*$ | 0 |
| `variance` | $\text{Var}(\hat\mu)$ | - |
| `rmse` | $\sqrt{\text{Bias}^2 + \text{Var}}$ | Small |
| `empirical_se` | $\sqrt{\text{Var}}$ | - |
| `se_ratio` | $\hat{SE} / SE_{emp}$ | 1.0 |
| `coverage` | $P(\mu^* \in CI)$ | 95% |

## Validation Targets

| Metric | Valid Range | Interpretation |
|--------|-------------|----------------|
| Coverage | 93-97% | CI contains true value |
| SE Ratio | 0.9-1.2 | SE is properly calibrated |
| min(lambda) | > 1e-4 | Hessian is well-conditioned |

## Interpreting Results

### Good Results

```
Coverage: 95.0%   [PASS - in 93-97% range]
SE Ratio: 1.02    [PASS - close to 1.0]
RMSE: 0.032       [Low bias and variance]
```

### Warning Signs

```
Coverage: 30%     [FAIL - severe undercoverage]
SE Ratio: 0.27    [FAIL - SE underestimated 4x]
```

Common causes of poor coverage:
- Naive method (no IF correction)
- Too few folds (K < 20)
- Insufficient training epochs
- Model misspecification

## Monte Carlo Validation

For rigorous validation, run Monte Carlo simulations:

```python
import numpy as np
from deep_inference import structural_dml

M = 100  # number of simulations
N = 2000  # sample size
MU_TRUE = 0.5

results = []
for m in range(M):
    np.random.seed(m)

    # Generate data with known DGP
    X = np.random.randn(N, 10)
    T = np.random.randn(N)
    Y = X[:, 0] + MU_TRUE * T + np.random.randn(N)

    result = structural_dml(Y, T, X, family='linear', verbose=False)

    covered = result.ci_lower <= MU_TRUE <= result.ci_upper
    results.append({
        'mu_hat': result.mu_hat,
        'se': result.se,
        'covered': covered
    })

# Summary
coverage = np.mean([r['covered'] for r in results])
se_ratio = np.mean([r['se'] for r in results]) / np.std([r['mu_hat'] for r in results])

print(f"Coverage: {coverage:.1%}")
print(f"SE Ratio: {se_ratio:.2f}")
```