# Validation

Comprehensive eval suite validating every mathematical component of the influence function methodology.

```{toctree}
:maxdepth: 2
:caption: Validation

eval_01
eval_02
eval_03
eval_04
eval_05
eval_06
eval_07
eval_09
verification
```

---

## Eval Suite Overview

The package includes 8 evals in `evals/` validating Theorem 2.

| Eval | Component | Tests | Result | Details |
|------|-----------|-------|--------|---------|
| [01](eval_01.md) | Parameter Recovery θ̂(x) | 12 families × 3 seeds | 12/12 PASS | [→](eval_01.md) |
| [02](eval_02.md) | Autodiff vs Calculus | Score + Hessian | 31/31 PASS | [→](eval_02.md) |
| [03](eval_03.md) | Lambda Estimation Λ̂(x) | 5 methods | 9/9 PASS | [→](eval_03.md) |
| [04](eval_04.md) | Target Jacobian H_θ | Autodiff vs oracle | 92/92 PASS | [→](eval_04.md) |
| [05](eval_05.md) | Influence Function ψ | Assembly + coverage | 4/4 PASS | [→](eval_05.md) |
| [06](eval_06.md) | Frequentist Coverage | Monte Carlo M=50 | PASS | [→](eval_06.md) |
| [07](eval_07.md) | End-to-End | Full workflow | 7/7 PASS | [→](eval_07.md) |
| [09](eval_09.md) | Multinomial Logit | Recovery + Coverage | 98% coverage PASS | [→](eval_09.md) |

**Total: 228+ individual checks, all passing.**

---

## Quick Summary

### Eval 01: Parameter Recovery
Neural networks recover θ(x) = [α(x), β(x)] across all 12 families with Corr(β) > 0.94. [Details →](eval_01.md)

### Eval 02: Autodiff Accuracy
PyTorch autodiff matches calculus formulas to machine precision (error < 1e-14). [Details →](eval_02.md)

### Eval 03: Lambda Estimation
MLP achieves Corr=0.997 with true Λ(x); aggregate ignores heterogeneity (Corr=0.000). [Details →](eval_03.md)

### Eval 04: Target Jacobian
∂H/∂θ computed correctly for all targets and families (92/92 tests). [Details →](eval_04.md)

### Eval 05: Influence Functions
Complete ψ assembly validated with 88% coverage, SE ratio 0.87. [Details →](eval_05.md)

### Eval 06: Frequentist Coverage
Monte Carlo (M=50, n=5000) confirms valid CIs with z-scores ~ N(0,1). [Details →](eval_06.md)

### Eval 07: End-to-End
Full analyst workflow: Oracle vs Bootstrap vs NN comparison shows IF correction is essential. [Details →](eval_07.md)

### Eval 09: Multinomial Logit
Multinomial logit (conditional logit) validated with 98% coverage (M=50, n=8000). Recovery, autodiff, Lambda, and coverage all PASS. [Details →](eval_09.md)

---

## Running Evals

```bash
# Run all evals
python3 -m evals.run_all 2>&1 | tee evals/reports/run_all_$(date +%Y%m%d_%H%M%S).txt

# Run individual evals
python3 -m evals.eval_01_theta
python3 -m evals.eval_02_autodiff
python3 -m evals.eval_03_lambda
python3 -m evals.eval_04_jacobian
python3 -m evals.eval_05_psi
python3 -m evals.eval_06_coverage
python3 -m evals.eval_07_e2e
python3 -m evals.eval_09_multinomial
```

---

## References

- Farrell, Liang, Misra (2021): "Deep Neural Networks for Estimation and Inference" *Econometrica*
- Farrell, Liang, Misra (2025): "Deep Learning for Individual Heterogeneity" *Working Paper*
- [Verification Against FLM2](verification.md) - comparison with original implementation