Algorithm#

Implementation of Farrell-Liang-Misra influence function-based inference for neural networks.

Overview#

The package provides valid confidence intervals for neural network estimates by:

Training a structural network θ̂(x) via cross-fitting
Computing influence functions ψᵢ to correct for regularization bias
Using √n-consistent variance estimation

Core formula: $$\psi_i = H(\hat{\theta}_i) - H_\theta \cdot \Lambda(x_i)^{-1} \cdot \ell_\theta(z_i, \hat{\theta}_i)$$

Algorithm Flow#

For each fold k = 1, ..., K:
    1. Train θ̂_k(x) on I_k^c (training set)
    2. Compute scores: ℓ_θ(zᵢ, θ̂_k(xᵢ)) for i ∈ I_k
    3. Compute Hessians: ℓ_θθ(zᵢ, θ̂_k(xᵢ)) for i ∈ I_k
    4. Estimate Λ̂_k(x) from Hessians
    5. Assemble ψᵢ = h - H_θ · Λ̂⁻¹ · ℓ_θ

Aggregate:
    μ̂ = (1/n) Σᵢ ψᵢ
    SE = √(Var(ψ)/n)

“Cross-fitting prevents overfitting: the model is trained on data independent of where it is evaluated. This ensures the influence function correction is unbiased.” — FLM (2021), Algorithm 1

Lambda Estimation (Critical)#

The conditional Hessian Λ(x) = E[ℓ_θθ|X=x] has three estimation regimes:

Remark 4 (FLM 2021): “The conditional Hessian Λ(x) = E[ℓ_θθ|X=x] varies with x through θ(x). When treatment is randomized, Λ can be computed analytically; otherwise it must be estimated, requiring three-way sample splitting.”

Regime A: Randomized Experiments (`ComputeLambda`)#

When treatment T is randomized and independent of X:

# Monte Carlo integration over treatment distribution
Λ(x) = E_T[ℓ_θθ(Y, T, θ(x)) | X=x]
     ≈ (1/M) Σₘ ℓ_θθ(y, tₘ, θ(x))  where tₘ ~ P(T)

Use: inference(..., is_randomized=True, treatment_dist=Normal(0, 1))

Regime B: Linear Models (`AnalyticLambda`)#

For linear models, Λ(x) = E[TT’|X=x] has closed form:

# No θ-dependence, so 2-way splitting suffices
Λ(x) = E[(1,T)(1,T)' | X=x]

Use: inference(..., model='linear') (auto-detected)

Regime C: Observational Nonlinear (`EstimateLambda`)#

For logit, Poisson, etc. in observational settings:

Method	Correlation	Coverage	Speed	Use Case
`ridge`	0.508	96%	0.08s	Recommended (default)
`aggregate`	0.000	95%	0.02s	Ignores heterogeneity
`lgbm`	0.978	96%	1.2s	High accuracy
`rf`	0.904	~90%	0.3s	Moderate accuracy
`mlp`	0.997	67%	12.5s	AVOID - invalid SEs

Key finding: MLP achieves highest correlation (0.997) but produces invalid standard errors (67% coverage instead of 95%). Ridge is the safe default with validated 96% coverage.

Use: structural_dml(..., lambda_method='ridge') (default) or inference(..., lambda_method='lgbm')

Three-Way Splitting#

When It’s Needed#

Three-way splitting is required when Λ(x) depends on θ(x).

Two-way suffices:

Linear models (Λ = E[TT’|X], no θ-dependence)
Randomized experiments (can compute Λ analytically)

Three-way required:

Logit, Poisson, Gamma, Weibull, etc.
Hessian weights depend on θ: e.g., p(1-p) for logit where p = σ(θ’x)

“When the Hessian weight depends on θ (as in logit where w = p(1-p) and p = σ(θ’x)), separate samples are needed to estimate θ̂ and Λ̂ to avoid overfitting bias.” — FLM (2021), Section 3.2

Auto-Detection#

The package auto-detects θ-dependence:

# From core/algorithm.py
def detect_theta_dependence(family, X, T, theta):
    """Sample 100 observations, perturb θ, check if Hessian changes."""
    sample_idx = np.random.choice(len(X), min(100, len(X)))
    H1 = family.hessian(Y[sample_idx], T[sample_idx], theta[sample_idx])
    H2 = family.hessian(Y[sample_idx], T[sample_idx], theta[sample_idx] + 0.1)
    return not np.allclose(H1, H2, rtol=1e-5)

How It Works#

When three_way=True (or auto-detected):

For each fold k:
    Training set I_k^c is split:
    - 60% for training θ̂_k(x)
    - 40% for fitting Λ̂_k(x)

This prevents overfitting: θ̂ and Λ̂ are estimated on independent data.

Regularization#

Lambda Inversion#

Inverting Λ(x) can be unstable when Hessians are near-singular. The package uses Tikhonov regularization:

# From utils/linalg.py
def batch_inverse(Lambda, ridge=1e-4):
    """Invert with ridge regularization for stability."""
    Lambda_reg = Lambda + ridge * torch.eye(Lambda.shape[-1])
    return torch.linalg.inv(Lambda_reg)

Default: ridge=1e-4

Diagnostics#

The package tracks min_lambda_eigenvalue:

Warning: min_lambda_eigenvalue=1.23e-06 < 1e-4
Consider increasing ridge regularization or checking model fit.

When to increase regularization:

Near-singular Hessians (min eigenvalue < 1e-4)
Logit with p ≈ 0 or p ≈ 1 (separation issues)
Overdispersed count models

Variance Estimation#

Within-Fold Formula#

# From core/algorithm.py
Ψ̂ = (1/K) Σ_k Var_k(ψ)
  = (1/K) Σ_k (1/|I_k|) Σ_{i ∈ I_k} (ψᵢ - μ̂_k)²

SE = √(Ψ̂/n)

Where μ̂_k is the fold-specific mean.

Theorem 3 (FLM 2021): “The variance estimator $\hat{V} = n^{-1}\sum_i \hat{\psi}_i^2$ is consistent: $\hat{V} \xrightarrow{p} V$. The resulting confidence intervals achieve nominal coverage asymptotically.”

High Correction Variance Warning#

Warning: High correction variance ratio (43.96).
This suggests the influence function correction dominates the estimate variance.
Consider using more cross-fitting folds (K >= 50).

This warns when:

correction_variance / total_variance > 10

Indicates the bias correction term H_θ · Λ⁻¹ · ℓ_θ has high variance, often from:

Insufficient cross-fitting folds
Poor θ̂(x) estimation in some regions
Near-singular Lambda

Two APIs#

Legacy: `structural_dml()`#

For the 8 built-in GLM families:

from deep_inference import structural_dml

result = structural_dml(
    Y=Y, T=T, X=X,
    family='logit',           # linear, logit, poisson, gamma, ...
    # lambda_method='ridge' is default (validated coverage)
    n_folds=50,
    epochs=100,
)

New: `inference()`#

For flexible targets and custom models:

from deep_inference import inference

result = inference(
    Y=Y, T=T, X=X,
    model='logit',
    target='ame',              # ame, beta, or custom
    # lambda_method='ridge' is default (validated coverage)
    is_randomized=False,
)

Both use the same core algorithm; inference() adds:

Regime detection (A/B/C)
Custom target functions
Treatment distribution specification

Implementation Files#

File	Purpose
`core/algorithm.py`	Main `structural_dml_core()` with fold logic
`engine/crossfit.py`	Modular `CrossFitter` class
`engine/assembler.py`	Influence function assembly
`engine/variance.py`	SE computation
`lambda_/estimate.py`	MLP/LGBM Lambda estimation
`lambda_/selector.py`	Regime detection
`autodiff/`	Score, Hessian, Jacobian computation

Pseudocode (Actual Implementation)#

def structural_dml_core(Y, T, X, family, lambda_method, n_folds, ...):
    # Phase 1: Create folds
    kf = KFold(n_splits=n_folds, shuffle=True)

    # Detect if 3-way splitting needed
    three_way = detect_theta_dependence(family, X, T, initial_theta)

    psi_all = []
    for fold_idx, (train_idx, eval_idx) in enumerate(kf.split(X)):
        # Phase 2a: Train θ̂(x)
        if three_way:
            theta_train = train_idx[:int(0.6 * len(train_idx))]
            lambda_train = train_idx[int(0.6 * len(train_idx)):]
        else:
            theta_train = train_idx
            lambda_train = train_idx

        model = train_structural_net(X[theta_train], T[theta_train], Y[theta_train])
        theta_hat = model(X[eval_idx])

        # Phase 2b: Compute Hessians on lambda_train
        hessians = family.hessian(Y[lambda_train], T[lambda_train], theta_hat_train)

        # Phase 2c: Fit Lambda
        if lambda_method == 'aggregate':
            Lambda = hessians.mean(axis=0)  # Single matrix for all x
        else:
            Lambda_model = train_lambda_estimator(X[lambda_train], hessians)
            Lambda = Lambda_model(X[eval_idx])

        # Phase 3: Assemble influence function
        for i in eval_idx:
            score = family.gradient(Y[i], T[i], theta_hat[i])
            h = target.h(theta_hat[i])
            h_jacobian = target.jacobian(theta_hat[i])

            Lambda_inv = batch_inverse(Lambda[i], ridge=1e-4)
            psi_i = h - h_jacobian @ Lambda_inv @ score
            psi_all.append(psi_i)

    # Phase 4: Aggregate
    mu_hat = np.mean(psi_all)
    se = np.sqrt(np.var(psi_all) / n)

    return DMLResult(mu_hat=mu_hat, se=se, ...)

References#

Farrell, Liang, Misra (2021): “Deep Neural Networks for Estimation and Inference” Econometrica
Farrell, Liang, Misra (2025): “Deep Learning for Individual Heterogeneity” Working Paper