Models Module#

Neural network architectures for structural estimation.

Network Classes#

StructuralNet#

The main neural network architecture for structural parameter estimation.

from deep_inference.models import StructuralNet

# Create network
net = StructuralNet(
    input_dim=10,           # Number of covariates
    hidden_dims=[64, 32],   # Hidden layer sizes
    theta_dim=2,            # Number of parameters (alpha, beta)
    dropout=0.1             # Dropout rate
)

# Forward pass
import torch
X = torch.randn(100, 10)
theta = net(X)  # (100, 2)

Network Architecture#

Input (d features)
    |
Linear(d, hidden_dims[0])
    |
ReLU + Dropout
    |
Linear(hidden_dims[0], hidden_dims[1])
    |
ReLU + Dropout
    |
...
    |
Linear(hidden_dims[-1], theta_dim)
    |
Output (theta_dim parameters per observation)

Usage with structural_dml#

The structural_dml function creates and trains the network internally:

from deep_inference import structural_dml

result = structural_dml(
    Y=Y, T=T, X=X,
    family='linear',
    hidden_dims=[64, 32],  # Network architecture
    epochs=100,            # Training epochs
    lr=0.01               # Learning rate
)

# Access estimated parameters
theta_hat = result.theta_hat  # (n, theta_dim) numpy array
alpha_hat = theta_hat[:, 0]
beta_hat = theta_hat[:, 1]

Training Configuration#

Parameter

Default

Description

hidden_dims

[64, 32]

Hidden layer sizes

epochs

100

Training epochs

lr

0.01

Learning rate

batch_size

64

Mini-batch size

weight_decay

1e-4

L2 regularization

dropout

0.1

Dropout rate

Architecture Guidelines#

Sample Size

Recommended Architecture

n < 1,000

[32, 16]

1,000 < n < 10,000

[64, 32]

10,000 < n < 100,000

[128, 64, 32]

n > 100,000

[256, 128, 64]

MultinomialLogitModel#

For multinomial logit (conditional logit / McFadden) choice models:

from deep_inference.models.multinomial import MultinomialLogitModel

model = MultinomialLogitModel(n_alternatives=3, n_attributes=2)
# theta_dim = (J-1) + K = 4
# theta = [alpha_1, ..., alpha_{J-1}, beta_1, ..., beta_K]
# Hessian: Fisher information (no Y dependence, depends on theta)
# Requires 3-way cross-fitting

CombinatorialModel#

For multi-treatment combinatorial experiments with binary treatment vectors T ∈ {0,1}^m.

from deep_inference.models.combinatorial import CombinatorialModel

model = CombinatorialModel(n_treatments=3, link='gen_sigmoid_ii')
# theta_dim = m + 2 = 5 for gen_sigmoid_ii
# theta = [θ₀, θ₁, θ₂, θ₃, θ₄]

Four link functions:

Link

Formula

θ_dim

Description

multiplicative

θ₀ · ∏(1 + θ_k · t_k)

m+1

Product interaction

sigmoid

a/(1+exp(-(θ₀ + Σ θ_k·t_k))) + b

m+1

Bounded response with fixed scale

gen_sigmoid_i

θ_{m+1} · σ(Σ θ_k·t_k)

m+1

Flexible scale, no intercept

gen_sigmoid_ii

θ_{m+1} · σ(θ₀ + Σ θ_k·t_k)

m+2

Most flexible (recommended)

Hessian properties:

  • Uses Fisher information: 2·G_θ·G_θ’ (does NOT depend on y)

  • hessian_depends_on_theta = True (G_θ depends on θ)

  • hessian_depends_on_y = False (Fisher information)

  • Requires 3-way cross-fitting (Regime C)

Lambda computation for randomized experiments:

# Compute Λ(x) via Monte Carlo for Regime A
import torch
t_samples = torch.randint(0, 2, (1000, 3)).float()
Lambda = model.compute_lambda_integral(theta, t_samples)

Usage with MultiTreatmentATE:

from deep_inference.targets import MultiTreatmentATE

model = CombinatorialModel(n_treatments=3, link='gen_sigmoid_ii')
target = MultiTreatmentATE(model=model, treatment=[1, 0, 1])

Reference: Ye et al. (2025, Management Science) — DeDL: Debiased Deep Learning for Combinatorial Experiments


Custom Network Usage#

For advanced users who want to use the network directly:

import torch
import torch.nn as nn
from deep_inference.models import StructuralNet
from deep_inference import LinearFamily

# Create network and family
net = StructuralNet(input_dim=10, hidden_dims=[64, 32], theta_dim=2)
family = LinearFamily()
optimizer = torch.optim.Adam(net.parameters(), lr=0.01)

# Training loop
for epoch in range(100):
    theta = net(X_tensor)
    loss = family.loss(Y_tensor, T_tensor, theta).mean()

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()