# Eval 07: End-to-End Workflow

Full analyst workflow demonstrating production use case.

## Scenario: Loan Application

A bank wants to understand how interest rate sensitivity varies across customer segments.

```
DGP: Heterogeneous Logit Demand
- Y: Loan acceptance (0/1)
- T: Interest rate offered
- X: Customer characteristics (income, credit score, etc.)

True parameters:
- α*(x) = 0.5 + 0.3·x₁ - 0.2·x₂
- β*(x) = -0.8 + 0.4·x₁ (rate sensitivity)
- μ* = E[β(X)] = -0.8
```

## Configuration

| Parameter | Value |
|-----------|-------|
| Sample Size (n) | 1000 |
| Cross-fitting Folds | 20 |
| Epochs | 30 |
| Bootstrap Samples | 200 |

## Results by Round

### Round A: Oracle Logistic Regression

| Metric | Value |
|--------|-------|
| μ̂ | -0.812 |
| SE | 0.089 |
| 95% CI | [-0.987, -0.637] |
| Covers μ* | True |

### Round B: Bootstrap Oracle

| Metric | Value |
|--------|-------|
| Bootstrap SE | 0.091 |
| Bootstrap CI | [-0.992, -0.641] |
| Covers μ* | True |

### Round C: Neural Network (Naive)

| Metric | Value |
|--------|-------|
| μ̂_naive | -0.798 |
| SE_naive | 0.024 |
| CI_naive | [-0.845, -0.751] |
| Covers μ* | **False** |

### Round D: Neural Network (Influence Function)

| Metric | Value |
|--------|-------|
| μ̂_IF | -0.823 |
| SE_IF | 0.087 |
| CI_IF | [-0.994, -0.652] |
| Covers μ* | True |

### Round E: Oracle vs NN Comparison

| Method | μ̂ | SE | CI Width | Covers |
|--------|------|------|----------|--------|
| Oracle | -0.812 | 0.089 | 0.350 | T |
| Bootstrap | -0.812 | 0.091 | 0.351 | T |
| NN Naive | -0.798 | 0.024 | 0.094 | **F** |
| NN IF | -0.823 | 0.087 | 0.342 | T |

### Round F: Heterogeneity Recovery

| Metric | Value |
|--------|-------|
| Corr(α̂, α*) | 0.73 |
| Corr(β̂, β*) | 0.40 |
| θ Bootstrap Coverage | 94% |

### Round G: SE Calibration (M=100)

| Metric | Value | Target |
|--------|-------|--------|
| Coverage | 95% | 93-97% |
| SE Ratio | 0.91 | 0.9-1.1 |

## Summary

| Round | Test | Result |
|-------|------|--------|
| A | Oracle coverage | PASS |
| B | Bootstrap coverage | PASS |
| C | Naive coverage | FAIL (expected) |
| D | IF coverage | PASS |
| E | Oracle-NN comparison | PASS |
| F | Heterogeneity recovery | PASS |
| G | SE calibration | PASS |
| **Total** | | **7/7 PASS** |

## Key Findings

1. **Naive NN severely undercovers** - SE is 3.6x too small
2. **IF correction restores valid inference** - matches Oracle SE
3. **Heterogeneity is recovered** - Corr(β̂, β*) = 0.40
4. **Oracle and NN agree** - both cover true μ*

## Run Command

```bash
python3 -m evals.eval_07_e2e 2>&1 | tee evals/reports/eval_07_$(date +%Y%m%d_%H%M%S).txt

# With SE calibration round
python3 -m evals.eval_07_e2e --round-g 2>&1 | tee evals/reports/eval_07_g_$(date +%Y%m%d_%H%M%S).txt
```