Deep Learning for Individual Heterogeneity#
This paper develops a framework for embedding deep neural networks into structural economic models to capture rich heterogeneity while preserving interpretability.
Core Framework#
Starting point: A parametric structural model
Enriched model: Parameters become functions of observables \(X\)
Second-stage parameter of interest:
Estimation via Structured DNNs#
The parameter functions are estimated by:
Convergence rate (Theorem 1): For smooth parameter functions with \(p\) derivatives and \(d_c\) continuous covariates:
Theorem 1 (FLM 2021): “Under smoothness assumptions, the neural network estimator achieves the minimax optimal rate: $\(\|\hat{\theta}_k - \theta^\star_k\|^2_{L_2} = O_P\left(n^{-\frac{p}{p+d_c}} \log^8 n\right)\)$ where p is the smoothness of θ*(·) and d_c is the dimension of continuous covariates.”
Inference via Influence Functions#
Influence function (Theorem 2):
where:
\(\ell_\theta\) is the gradient of the loss w.r.t. \(\theta\)
\(\Lambda(x) = \mathbb{E}[\ell_{\theta\theta}(Y, T, \theta(x)) \mid X = x]\) is the conditional Hessian
\(H_\theta\) is the Jacobian of \(H\) w.r.t. \(\theta\)
Cross-fitted estimator:
Asymptotic normality: Under rate conditions \(\|\hat{\theta} - \theta^\star\|_{L_2} = o_P(n^{-1/4})\):
Theorem 2 (FLM 2021): “Under rate conditions \(\|\hat{\theta} - \theta^\star\|_{L_2} = o_P(n^{-1/4})\), the cross-fitted estimator satisfies: $\(\sqrt{n}(\hat{\mu} - \mu^\star) \xrightarrow{d} N(0, \Psi)\)\( where \)\Psi = E[\psi_0(W)^2]\( and \)\psi_0$ is the efficient influence function.”
Neyman Orthogonality (FLM 2021): “The influence function ψ satisfies Neyman orthogonality, meaning first-order errors in nuisance estimation have no first-order effect on the target estimator. This is why the bias scales as O(δ²) rather than O(δ).”
Critical Rate Condition: The rate \(n^{-1/4}\) threshold comes from the product rate requirement:
“The product of nuisance estimation errors must satisfy \(\|\hat{\theta} - \theta^\star\| \cdot \|\hat{\Lambda} - \Lambda^\star\| = o_P(n^{-1/2})\)” — FLM (2021), Theorem 2 conditions
Application: Binary Choice with Heterogeneity#
Model:
where \(G(u) = 1/(1 + e^{-u})\) is the logit function.
Average marginal effect:
Optimal personalized pricing: Solve for \(r_{opt}\) via:
where expected profits are:
Key Insight#
Machine learning and economic structure are complements, not substitutes.
ML alone fits data well but extrapolates nonsensically and can’t answer causal questions
Structure alone provides interpretability but misses heterogeneity
Combined: ML learns heterogeneity patterns \(\theta(X)\) while structure ensures valid economics