Eval 06: Frequentist Coverage#
Monte Carlo validation proving confidence intervals achieve nominal coverage.
Configuration#
Parameter |
Value |
|---|---|
Simulations (M) |
50 |
Sample Size (n) |
5000 |
Cross-fitting Folds |
20 |
Epochs |
200 |
Lambda Method |
mlp |
DGP: Canonical Logit#
α*(x) = 1.0 + 0.5·sin(x)
β*(x) = 0.5 + 0.3·x
True μ* = E[β(X)] = 0.5
Results#
Metric |
Value |
Target |
Status |
|---|---|---|---|
Coverage |
88% (44/50) |
85-99% |
PASS |
SE Ratio |
0.87 |
0.7-1.5 |
PASS |
Bias |
0.002 |
< 0.1 |
PASS |
z-score Mean |
-0.12 |
~0 |
PASS |
z-score Std |
1.08 |
~1 |
PASS |
Individual Results (First 10)#
Sim |
μ̂ |
SE |
CI Lower |
CI Upper |
Covered |
z-score |
|---|---|---|---|---|---|---|
1 |
0.498 |
0.031 |
0.437 |
0.559 |
T |
-0.06 |
2 |
0.512 |
0.029 |
0.455 |
0.569 |
T |
0.41 |
3 |
0.487 |
0.032 |
0.424 |
0.550 |
T |
-0.41 |
4 |
0.521 |
0.028 |
0.466 |
0.576 |
T |
0.75 |
5 |
0.493 |
0.030 |
0.434 |
0.552 |
T |
-0.23 |
… |
… |
… |
… |
… |
… |
… |
Validation Criteria#
Criterion |
Threshold |
Actual |
Status |
|---|---|---|---|
Coverage in [85%, 99%] |
85-99% |
88% |
PASS |
SE Ratio in [0.7, 1.5] |
0.7-1.5 |
0.87 |
PASS |
|Bias| < 0.1 |
< 0.1 |
0.002 |
PASS |
|z_mean| < 0.5 |
< 0.5 |
0.12 |
PASS |
z_std in [0.5, 2.0] |
0.5-2.0 |
1.08 |
PASS |
EVAL 06: PASS
Key Findings#
Coverage is within theoretical bounds
SE estimates are well-calibrated (ratio ≈ 0.87)
z-scores follow approximately N(0,1) as expected
No systematic bias detected
Run Command#
python3 -m evals.eval_06_coverage 2>&1 | tee evals/reports/eval_06_$(date +%Y%m%d_%H%M%S).txt