# Deep Learning for Individual Heterogeneity

This paper develops a framework for embedding deep neural networks into structural economic models to capture rich heterogeneity while preserving interpretability.

---

## Core Framework

**Starting point:** A parametric structural model

$$\theta^\star = \arg\min_{\theta \in \Theta} \mathbb{E}[\ell(Y, T, \theta)]$$

**Enriched model:** Parameters become functions of observables $X$

$$\theta^\star(\cdot) = \arg\min_{\theta \in \mathcal{F}} \mathbb{E}[\ell(Y, T, \theta(X))]$$

**Second-stage parameter of interest:**

$$\mu^\star = \mathbb{E}[H(X, \theta^\star(X), \tilde{t})]$$

---

## Estimation via Structured DNNs

The parameter functions are estimated by:

$$\hat{\theta}(\cdot) = \arg\min_{\theta \in \mathcal{F}_{dnn}} \frac{1}{n} \sum_{i=1}^{n} \ell(y_i, t_i, \theta(x_i))$$

**Convergence rate (Theorem 1):** For smooth parameter functions with $p$ derivatives and $d_c$ continuous covariates:

$$\|\hat{\theta}_k - \theta^\star_k\|^2_{L_2(X)} = O\left(n^{-\frac{p}{p+d_c}} \log^8 n\right)$$

> **Theorem 1 (FLM 2021):** "Under smoothness assumptions, the neural network estimator achieves the minimax optimal rate:
> $$\|\hat{\theta}_k - \theta^\star_k\|^2_{L_2} = O_P\left(n^{-\frac{p}{p+d_c}} \log^8 n\right)$$
> where p is the smoothness of θ*(·) and d_c is the dimension of continuous covariates."

---

## Inference via Influence Functions

**Influence function (Theorem 2):**

$$\psi(y, t, x, \theta, \Lambda) = H(x, \theta(x); \tilde{t}) - H_\theta(x, \theta(x); \tilde{t}) \Lambda(x)^{-1} \ell_\theta(y, t, \theta(x))$$

where:
- $\ell_\theta$ is the gradient of the loss w.r.t. $\theta$
- $\Lambda(x) = \mathbb{E}[\ell_{\theta\theta}(Y, T, \theta(x)) \mid X = x]$ is the conditional Hessian
- $H_\theta$ is the Jacobian of $H$ w.r.t. $\theta$

**Cross-fitted estimator:**

$$\hat{\mu} = \frac{1}{K} \sum_{k=1}^{K} \frac{1}{|I_k|} \sum_{i \in I_k} \psi(y_i, t_i, \hat{\theta}_k(x_i), \hat{\Lambda}_k(x_i))$$

**Asymptotic normality:** Under rate conditions $\|\hat{\theta} - \theta^\star\|_{L_2} = o_P(n^{-1/4})$:

$$\sqrt{n}(\hat{\mu} - \mu^\star) \xrightarrow{d} N(0, \Psi)$$

> **Theorem 2 (FLM 2021):** "Under rate conditions $\|\hat{\theta} - \theta^\star\|_{L_2} = o_P(n^{-1/4})$, the cross-fitted estimator satisfies:
> $$\sqrt{n}(\hat{\mu} - \mu^\star) \xrightarrow{d} N(0, \Psi)$$
> where $\Psi = E[\psi_0(W)^2]$ and $\psi_0$ is the efficient influence function."

> **Neyman Orthogonality (FLM 2021):** "The influence function ψ satisfies Neyman orthogonality, meaning first-order errors in nuisance estimation have no first-order effect on the target estimator. This is why the bias scales as O(δ²) rather than O(δ)."

**Critical Rate Condition:** The rate $n^{-1/4}$ threshold comes from the product rate requirement:
> "The product of nuisance estimation errors must satisfy $\|\hat{\theta} - \theta^\star\| \cdot \|\hat{\Lambda} - \Lambda^\star\| = o_P(n^{-1/2})$"
> — FLM (2021), Theorem 2 conditions

---

## Application: Binary Choice with Heterogeneity

**Model:**

$$P[Y=1 \mid X=x, R=r] = G(\theta_1^\star(x_d, x_a) + \theta_2^\star(x_d) r)$$

where $G(u) = 1/(1 + e^{-u})$ is the logit function.

**Average marginal effect:**

$$\text{AME}(\tilde{r}) = \mathbb{E}[G(\theta^\star(X)'\tilde{r}_1)(1 - G(\theta^\star(X)'\tilde{r}_1))\theta_2^\star(X)]$$

**Optimal personalized pricing:** Solve for $r_{opt}$ via:

$$\frac{d\Pi(r)}{dr} = 0$$

where expected profits are:

$$\Pi(r) = L\left[P(r)(M(1-D(r))r - D(r)) + (1-P(r))Mr_0\right]$$

---

## Key Insight

**Machine learning and economic structure are complements, not substitutes.**

- **ML alone** fits data well but extrapolates nonsensically and can't answer causal questions
- **Structure alone** provides interpretability but misses heterogeneity
- **Combined**: ML learns heterogeneity patterns $\theta(X)$ while structure ensures valid economics

---

## Applications and Related Work

### Personalized Pricing (Dube & Misra, 2022)

Dube & Misra (2022, *JPE*) apply the FLM framework to personalized pricing with heterogeneous demand. By estimating $\beta(X)$ (price sensitivity) as a function of consumer characteristics, they compute:

- **Price elasticities**: $\eta(X) = (1-p) \cdot \beta(X) \cdot P$ — how responsive each consumer is to price changes
- **Optimal personalized prices**: via the Lerner markup rule $\frac{P-MC}{P} = -1/\eta$
- **Consumer welfare**: using the Small & Rosen (1981) logsum formula $CS = \log(1 + e^V) / |\beta_{\text{price}}|$

`deep-inference` implements all three as built-in targets: `Elasticity`, `WTP`, `ConsumerWelfare`. See the [Pricing Tutorial](../tutorials/pricing.md).

### Continuous Treatment Inference (Colangelo & Lee, 2026)

Colangelo & Lee (2026) develop double debiased ML for nonparametric inference with continuous treatments. They cite FLM DNNs as valid nuisance estimators that achieve the required convergence rates. `deep-inference` complements their nonparametric approach with a structural alternative: all model families natively support continuous $T$, enabling dose-response analysis with economic structure. See the [Continuous Treatment Tutorial](../tutorials/continuous_treatment.md).