Linear Model Tutorial#
The linear model is the baseline case for understanding influence function inference.
When to Use#
Use the linear model when:
Outcome is continuous and unbounded
Errors are approximately normal
Examples: wages, test scores, consumption, prices
Mathematical Setup#
Data Generating Process#
Where:
\(\alpha(X)\) is the baseline function (intercept)
\(\beta(X)\) is the treatment effect (slope)
\(T\) is the treatment variable
\(\varepsilon\) is idiosyncratic noise
Estimand#
The average treatment effect across the covariate distribution.
Loss Function#
Standard mean squared error loss.
Influence Score Components#
Component |
Formula |
|---|---|
Residual \(r\) |
\(Y - (\alpha + \beta T)\) |
Hessian weight \(W\) |
\(1\) (constant) |
Score \(\nabla\ell\) |
\(-r \cdot [1, \tilde{T}]\) |
Complete Example#
import numpy as np
from deep_inference import structural_dml
# Generate synthetic data
np.random.seed(42)
n = 2000
X = np.random.randn(n, 10)
T = np.random.randn(n)
beta_true = 0.5 + 0.3 * X[:, 0] # Heterogeneous effect
Y = X[:, 0] + beta_true * T + np.random.randn(n)
mu_true = beta_true.mean()
print(f"True mu* = {mu_true:.6f}")
# Run inference
result = structural_dml(
Y=Y, T=T, X=X,
family='linear',
hidden_dims=[64, 32],
epochs=100,
n_folds=50,
lr=0.01
)
print(result.summary())
# Compare to naive
print(f"\nNaive estimate: {result.mu_naive:.4f}")
print(f"Bias correction: {result.mu_hat - result.mu_naive:.4f}")
Expected Results#
From Eval 01: Parameter Recovery:
Family |
Corr(α) |
Corr(β) |
Status |
|---|---|---|---|
linear |
0.994 |
0.998 |
PASS |
The influence function correction produces valid confidence intervals. See Validation for full results.
Real-World Applications#
Wage Regression#
Estimate the effect of education on wages where the effect varies by worker characteristics:
# Y = log(wage)
# T = years of education
# X = (experience, age, industry, ...)
# Target: E[beta(X)] = average return to education
result = structural_dml(Y, T, X, family='linear')
Price Elasticity#
Estimate heterogeneous price elasticity in demand:
# Y = log(quantity)
# T = log(price)
# X = (demographics, season, ...)
# Target: E[beta(X)] = average price elasticity
result = structural_dml(Y, T, X, family='linear')
Key Takeaways#
Linear is the baseline: Unit Hessian weight makes the math simplest
K=50 folds is important: Each model sees 98% of data for stable estimation
SE = std(psi)/sqrt(n): This IS the correct formula when psi is computed properly
Check SE ratio: Target is 0.9-1.2 (close to 1.0)