Deep Learning for Individual Heterogeneity#

This paper develops a framework for embedding deep neural networks into structural economic models to capture rich heterogeneity while preserving interpretability.

Core Framework#

Starting point: A parametric structural model

\[\theta^\star = \arg\min_{\theta \in \Theta} \mathbb{E}[\ell(Y, T, \theta)]\]

Enriched model: Parameters become functions of observables $X$

\[\theta^\star(\cdot) = \arg\min_{\theta \in \mathcal{F}} \mathbb{E}[\ell(Y, T, \theta(X))]\]

Second-stage parameter of interest:

\[\mu^\star = \mathbb{E}[H(X, \theta^\star(X), \tilde{t})]\]

Estimation via Structured DNNs#

The parameter functions are estimated by:

\[\hat{\theta}(\cdot) = \arg\min_{\theta \in \mathcal{F}_{dnn}} \frac{1}{n} \sum_{i=1}^{n} \ell(y_i, t_i, \theta(x_i))\]

Convergence rate (Theorem 1): For smooth parameter functions with $p$ derivatives and $d_c$ continuous covariates:

\[\|\hat{\theta}_k - \theta^\star_k\|^2_{L_2(X)} = O\left(n^{-\frac{p}{p+d_c}} \log^8 n\right)\]

Theorem 1 (FLM 2021): “Under smoothness assumptions, the neural network estimator achieves the minimax optimal rate: $$\|\hat{\theta}_k - \theta^\star_k\|^2_{L_2} = O_P\left(n^{-\frac{p}{p+d_c}} \log^8 n\right)$$ where p is the smoothness of θ*(·) and d_c is the dimension of continuous covariates.”

Inference via Influence Functions#

Influence function (Theorem 2):

\[\psi(y, t, x, \theta, \Lambda) = H(x, \theta(x); \tilde{t}) - H_\theta(x, \theta(x); \tilde{t}) \Lambda(x)^{-1} \ell_\theta(y, t, \theta(x))\]

where:

$\ell_\theta$ is the gradient of the loss w.r.t. $\theta$
$\Lambda(x) = \mathbb{E}[\ell_{\theta\theta}(Y, T, \theta(x)) \mid X = x]$ is the conditional Hessian
$H_\theta$ is the Jacobian of $H$ w.r.t. $\theta$

Cross-fitted estimator:

\[\hat{\mu} = \frac{1}{K} \sum_{k=1}^{K} \frac{1}{|I_k|} \sum_{i \in I_k} \psi(y_i, t_i, \hat{\theta}_k(x_i), \hat{\Lambda}_k(x_i))\]

Asymptotic normality: Under rate conditions $\|\hat{\theta} - \theta^\star\|_{L_2} = o_P(n^{-1/4})$:

\[\sqrt{n}(\hat{\mu} - \mu^\star) \xrightarrow{d} N(0, \Psi)\]

Theorem 2 (FLM 2021): “Under rate conditions $\|\hat{\theta} - \theta^\star\|_{L_2} = o_P(n^{-1/4})$, the cross-fitted estimator satisfies: $$\sqrt{n}(\hat{\mu} - \mu^\star) \xrightarrow{d} N(0, \Psi)$$ where $\Psi = E[\psi_0(W)^2]$ and $\psi_0$ is the efficient influence function.”

Neyman Orthogonality (FLM 2021): “The influence function ψ satisfies Neyman orthogonality, meaning first-order errors in nuisance estimation have no first-order effect on the target estimator. This is why the bias scales as O(δ²) rather than O(δ).”

Critical Rate Condition: The rate $n^{-1/4}$ threshold comes from the product rate requirement:

“The product of nuisance estimation errors must satisfy $\|\hat{\theta} - \theta^\star\| \cdot \|\hat{\Lambda} - \Lambda^\star\| = o_P(n^{-1/2})$” — FLM (2021), Theorem 2 conditions

Application: Binary Choice with Heterogeneity#

Model:

\[P[Y=1 \mid X=x, R=r] = G(\theta_1^\star(x_d, x_a) + \theta_2^\star(x_d) r)\]

where $G(u) = 1/(1 + e^{-u})$ is the logit function.

Average marginal effect:

\[\text{AME}(\tilde{r}) = \mathbb{E}[G(\theta^\star(X)'\tilde{r}_1)(1 - G(\theta^\star(X)'\tilde{r}_1))\theta_2^\star(X)]\]

Optimal personalized pricing: Solve for $r_{opt}$ via:

\[\frac{d\Pi(r)}{dr} = 0\]

where expected profits are:

\[\Pi(r) = L\left[P(r)(M(1-D(r))r - D(r)) + (1-P(r))Mr_0\right]\]

Key Insight#

Machine learning and economic structure are complements, not substitutes.

ML alone fits data well but extrapolates nonsensically and can’t answer causal questions
Structure alone provides interpretability but misses heterogeneity
Combined: ML learns heterogeneity patterns $\theta(X)$ while structure ensures valid economics

Deep Learning for Individual Heterogeneity#

Core Framework#

Estimation via Structured DNNs#

Inference via Influence Functions#

Application: Binary Choice with Heterogeneity#

Key Insight#

Applications and Related Work#

Personalized Pricing (Dube & Misra, 2022)#

Continuous Treatment Inference (Colangelo & Lee, 2026)#