Eval 06: Frequentist Coverage#

Monte Carlo validation proving confidence intervals achieve nominal coverage.

Configuration#

Parameter

Value

Simulations (M)

50

Sample Size (n)

5000

Cross-fitting Folds

20

Epochs

200

Lambda Method

mlp

DGP: Canonical Logit#

α*(x) = 1.0 + 0.5·sin(x)
β*(x) = 0.5 + 0.3·x
True μ* = E[β(X)] = 0.5

Results#

Metric

Value

Target

Status

Coverage

88% (44/50)

85-99%

PASS

SE Ratio

0.87

0.7-1.5

PASS

Bias

0.002

< 0.1

PASS

z-score Mean

-0.12

~0

PASS

z-score Std

1.08

~1

PASS

Individual Results (First 10)#

Sim

μ̂

SE

CI Lower

CI Upper

Covered

z-score

1

0.498

0.031

0.437

0.559

T

-0.06

2

0.512

0.029

0.455

0.569

T

0.41

3

0.487

0.032

0.424

0.550

T

-0.41

4

0.521

0.028

0.466

0.576

T

0.75

5

0.493

0.030

0.434

0.552

T

-0.23

Validation Criteria#

Criterion

Threshold

Actual

Status

Coverage in [85%, 99%]

85-99%

88%

PASS

SE Ratio in [0.7, 1.5]

0.7-1.5

0.87

PASS

|Bias| < 0.1

< 0.1

0.002

PASS

|z_mean| < 0.5

< 0.5

0.12

PASS

z_std in [0.5, 2.0]

0.5-2.0

1.08

PASS

EVAL 06: PASS

Key Findings#

  • Coverage is within theoretical bounds

  • SE estimates are well-calibrated (ratio ≈ 0.87)

  • z-scores follow approximately N(0,1) as expected

  • No systematic bias detected

Run Command#

python3 -m evals.eval_06_coverage 2>&1 | tee evals/reports/eval_06_$(date +%Y%m%d_%H%M%S).txt