# Eval 09: Multinomial Logit Comprehensive validation of the multinomial logit (conditional logit / McFadden) implementation. ## Configuration | Parameter | Value (Recovery) | Value (Coverage) | |-----------|------------------|------------------| | Sample Size (n) | 10000 | 8000 | | Alternatives (J) | 3 | 3 | | Attributes (K) | 2 | 2 | | Simulations (M) | — | 50 | | Epochs | 300 | 300 | | Patience | 50 | 50 | | Cross-fitting Folds | 50 | 50 | ## DGP: Heterogeneous Conditional Logit ``` J = 3 alternatives, K = 2 attributes, d_w = 3 V_ij = alpha_j(W) + X'_ij * beta(W) P(Y=j | W, X) = softmax(V)[j] alpha_0 = 0 (reference) alpha_1(W) = 0.5 + 0.2*W[0] alpha_2(W) = -0.3 - 0.1*W[0] beta_1(W) = -0.8 - 0.2*W[0] beta_2(W) = 0.5 + 0.1*W[0] True mu* = E[beta_1(W)] = -0.8 ``` ## Test 1: Parameter Recovery | Component | RMSE | Correlation | Status | |-----------|------|-------------|--------| | alpha_1 | 0.08 | 0.90 | PASS | | alpha_2 | 0.12 | 0.78 | PASS | | beta_1 | 0.09 | 0.88 | PASS | | beta_2 | 0.10 | 0.85 | PASS | ## Test 2: Autodiff Validation Score and Hessian computed via autodiff match oracle closed-form formulas. | Metric | Value | Status | |--------|-------|--------| | Max score error | 4.44e-16 | PASS | | Max Hessian error | 4.44e-16 | PASS | ## Test 3: Lambda Estimation Monte Carlo integration for E[H | W=w] matches oracle. | Metric | Value | Threshold | Status | |--------|-------|-----------|--------| | Relative Frobenius error | < 0.15 | < 0.15 | PASS | | Min eigenvalue | > 1e-4 | > 1e-4 | PASS | | Non-PSD count | 0 | 0 | PASS | ## Test 4: Coverage (M=50) | Metric | Value | Target | Status | |--------|-------|--------|--------| | Coverage | 98% | 90-99% | PASS | | SE Ratio | 0.97 | 0.7-1.5 | PASS | | Bias | -0.006 | < 0.05 | PASS | | z-score Mean | 0.14 | (-0.3, 0.3) | PASS | | z-score Std | 0.96 | 0.7-1.5 | PASS | **EVAL 09: PASS** ## Key Findings - **patience=50 essential**: Default patience=10 triggers early stopping at ~15-20 epochs, fatal for 3-way split training - **n >= 8000 required**: 3-way splitting reduces effective training data to 60%; n=5000 gives only 88% coverage - **correction_ratio ~70-90 is normal**: Much larger than binary logit (~2) due to higher-dimensional theta - **alpha_2 is hardest**: Weakest signal (slope -0.1) requires most data for reliable recovery ## Run Command ```bash python3 -m evals.eval_09_multinomial 2>&1 | tee evals/reports/eval_09_$(date +%Y%m%d_%H%M%S).txt ```