Validation#
Comprehensive eval suite validating every mathematical component of the influence function methodology.
Validation
Eval Suite Overview#
The package includes 8 evals in evals/ validating Theorem 2.
Eval |
Component |
Tests |
Result |
Details |
|---|---|---|---|---|
Parameter Recovery θ̂(x) |
12 families × 3 seeds |
12/12 PASS |
||
Autodiff vs Calculus |
Score + Hessian |
31/31 PASS |
||
Lambda Estimation Λ̂(x) |
5 methods |
9/9 PASS |
||
Target Jacobian H_θ |
Autodiff vs oracle |
92/92 PASS |
||
Influence Function ψ |
Assembly + coverage |
4/4 PASS |
||
Frequentist Coverage |
Monte Carlo M=50 |
PASS |
||
End-to-End |
Full workflow |
7/7 PASS |
||
Multinomial Logit |
Recovery + Coverage |
98% coverage PASS |
Total: 228+ individual checks, all passing.
Quick Summary#
Eval 01: Parameter Recovery#
Neural networks recover θ(x) = [α(x), β(x)] across all 12 families with Corr(β) > 0.94. Details →
Eval 02: Autodiff Accuracy#
PyTorch autodiff matches calculus formulas to machine precision (error < 1e-14). Details →
Eval 03: Lambda Estimation#
MLP achieves Corr=0.997 with true Λ(x); aggregate ignores heterogeneity (Corr=0.000). Details →
Eval 04: Target Jacobian#
∂H/∂θ computed correctly for all targets and families (92/92 tests). Details →
Eval 05: Influence Functions#
Complete ψ assembly validated with 88% coverage, SE ratio 0.87. Details →
Eval 06: Frequentist Coverage#
Monte Carlo (M=50, n=5000) confirms valid CIs with z-scores ~ N(0,1). Details →
Eval 07: End-to-End#
Full analyst workflow: Oracle vs Bootstrap vs NN comparison shows IF correction is essential. Details →
Eval 09: Multinomial Logit#
Multinomial logit (conditional logit) validated with 98% coverage (M=50, n=8000). Recovery, autodiff, Lambda, and coverage all PASS. Details →
Running Evals#
# Run all evals
python3 -m evals.run_all 2>&1 | tee evals/reports/run_all_$(date +%Y%m%d_%H%M%S).txt
# Run individual evals
python3 -m evals.eval_01_theta
python3 -m evals.eval_02_autodiff
python3 -m evals.eval_03_lambda
python3 -m evals.eval_04_jacobian
python3 -m evals.eval_05_psi
python3 -m evals.eval_06_coverage
python3 -m evals.eval_07_e2e
python3 -m evals.eval_09_multinomial
References#
Farrell, Liang, Misra (2021): “Deep Neural Networks for Estimation and Inference” Econometrica
Farrell, Liang, Misra (2025): “Deep Learning for Individual Heterogeneity” Working Paper
Verification Against FLM2 - comparison with original implementation