Eval 07: End-to-End Workflow#
Full analyst workflow demonstrating production use case.
Scenario: Loan Application#
A bank wants to understand how interest rate sensitivity varies across customer segments.
DGP: Heterogeneous Logit Demand
- Y: Loan acceptance (0/1)
- T: Interest rate offered
- X: Customer characteristics (income, credit score, etc.)
True parameters:
- α*(x) = 0.5 + 0.3·x₁ - 0.2·x₂
- β*(x) = -0.8 + 0.4·x₁ (rate sensitivity)
- μ* = E[β(X)] = -0.8
Configuration#
Parameter |
Value |
|---|---|
Sample Size (n) |
1000 |
Cross-fitting Folds |
20 |
Epochs |
30 |
Bootstrap Samples |
200 |
Results by Round#
Round A: Oracle Logistic Regression#
Metric |
Value |
|---|---|
μ̂ |
-0.812 |
SE |
0.089 |
95% CI |
[-0.987, -0.637] |
Covers μ* |
True |
Round B: Bootstrap Oracle#
Metric |
Value |
|---|---|
Bootstrap SE |
0.091 |
Bootstrap CI |
[-0.992, -0.641] |
Covers μ* |
True |
Round C: Neural Network (Naive)#
Metric |
Value |
|---|---|
μ̂_naive |
-0.798 |
SE_naive |
0.024 |
CI_naive |
[-0.845, -0.751] |
Covers μ* |
False |
Round D: Neural Network (Influence Function)#
Metric |
Value |
|---|---|
μ̂_IF |
-0.823 |
SE_IF |
0.087 |
CI_IF |
[-0.994, -0.652] |
Covers μ* |
True |
Round E: Oracle vs NN Comparison#
Method |
μ̂ |
SE |
CI Width |
Covers |
|---|---|---|---|---|
Oracle |
-0.812 |
0.089 |
0.350 |
T |
Bootstrap |
-0.812 |
0.091 |
0.351 |
T |
NN Naive |
-0.798 |
0.024 |
0.094 |
F |
NN IF |
-0.823 |
0.087 |
0.342 |
T |
Round F: Heterogeneity Recovery#
Metric |
Value |
|---|---|
Corr(α̂, α*) |
0.73 |
Corr(β̂, β*) |
0.40 |
θ Bootstrap Coverage |
94% |
Round G: SE Calibration (M=100)#
Metric |
Value |
Target |
|---|---|---|
Coverage |
95% |
93-97% |
SE Ratio |
0.91 |
0.9-1.1 |
Summary#
Round |
Test |
Result |
|---|---|---|
A |
Oracle coverage |
PASS |
B |
Bootstrap coverage |
PASS |
C |
Naive coverage |
FAIL (expected) |
D |
IF coverage |
PASS |
E |
Oracle-NN comparison |
PASS |
F |
Heterogeneity recovery |
PASS |
G |
SE calibration |
PASS |
Total |
7/7 PASS |
Key Findings#
Naive NN severely undercovers - SE is 3.6x too small
IF correction restores valid inference - matches Oracle SE
Heterogeneity is recovered - Corr(β̂, β*) = 0.40
Oracle and NN agree - both cover true μ*
Run Command#
python3 -m evals.eval_07_e2e 2>&1 | tee evals/reports/eval_07_$(date +%Y%m%d_%H%M%S).txt
# With SE calibration round
python3 -m evals.eval_07_e2e --round-g 2>&1 | tee evals/reports/eval_07_g_$(date +%Y%m%d_%H%M%S).txt