# Inference API

This module provides two main functions for structural estimation with valid inference.

## API Overview

| Function | Use Case | Target | Lambda |
|----------|----------|--------|--------|
| `structural_dml()` | Production, 13 families | E[β(X)] fixed | Estimated |
| `inference()` | Flexible targets, regimes | Custom h(θ) | Auto-selected |

---

## `structural_dml()` - Legacy API

The production-ready API supporting 13 GLM families.

### Signature

```python
from deep_inference import structural_dml

result = structural_dml(
    Y,                      # (n,) outcomes
    T,                      # (n,) treatments
    X,                      # (n, d) covariates
    family='linear',        # Family name: 'linear', 'logit', 'poisson', etc.
    target=None,            # Target variant (e.g., 'ame' for logit)
    hidden_dims=[64, 32],   # Network architecture
    epochs=100,             # Training epochs
    n_folds=50,             # Cross-fitting folds
    lr=0.01,                # Learning rate
    lambda_method='ridge',  # Lambda estimation: ridge (default), aggregate, lgbm
    verbose=False           # Print progress
)
```

### Supported Families

| Family | Model | θ_dim | Notes |
|--------|-------|-------|-------|
| `linear` | Y = α + βT + ε | 2 | OLS-equivalent |
| `logit` | P(Y=1) = σ(α + βT) | 2 | Binary outcomes |
| `poisson` | Y ~ Pois(exp(α + βT)) | 2 | Count data |
| `gamma` | Y ~ Gamma(k, exp(α + βT)) | 2 | Positive continuous |
| `gumbel` | Y ~ Gumbel(α + βT, σ) | 2 | Extreme values |
| `tobit` | Y = max(0, α + βT + σε) | 3 | Censored |
| `negbin` | Y ~ NegBin(exp(α + βT), r) | 2 | Overdispersed counts |
| `weibull` | Y ~ Weibull(k, exp(α + βT)) | 2 | Survival/duration |
| `multinomial_logit` | P(Y=j) = softmax(α_j + X'_j β) | (J-1)+K | Discrete choice (J>=3) |

### Example

```python
import numpy as np
from deep_inference import structural_dml

# Generate data
np.random.seed(42)
n = 2000
X = np.random.randn(n, 10)
T = np.random.randn(n)
Y = X[:, 0] + 0.5 * T + np.random.randn(n)

# Run inference
result = structural_dml(
    Y=Y, T=T, X=X,
    family='linear',
    n_folds=50,
    epochs=100
)

print(result.summary())
```

Output:
```
==============================================================================
                            Structural DML Results
==============================================================================
Family:           Linear               Target:           E[beta]
No. Observations: 2000                 No. Folds:        50
==============================================================================
                  coef     std err         z     P>|z|      [0.025    0.975]
------------------------------------------------------------------------------
     E[beta]    0.5012      0.0234    21.410    0.000     0.4553    0.5471
==============================================================================
Diagnostics:
  Min Lambda eigenvalue:    0.998765
  Mean condition number:    1.00
  Correction ratio:         0.0523
------------------------------------------------------------------------------
```

### Viewing Results with `summary()`

Both `DMLResult` and `InferenceResult` provide a `summary()` method that produces a statsmodels-style table with:

- **Header**: Family, target, sample size, number of folds, date/time
- **Coefficient table**: Estimate, standard error, z-statistic, p-value, 95% CI
- **Diagnostics**: Lambda eigenvalue, condition number, correction ratio

```python
# Get formatted summary string
print(result.summary())

# Individual components are still accessible
print(result.mu_hat)      # Point estimate
print(result.se)          # Standard error
print(result.ci_lower)    # Lower 95% CI
print(result.ci_upper)    # Upper 95% CI
```

### DMLResult Object

| Attribute | Type | Description |
|-----------|------|-------------|
| `mu_hat` | float | Debiased estimate of E[β(X)] |
| `mu_naive` | float | Naive (biased) estimate |
| `se` | float | Standard error |
| `ci_lower` | float | Lower 95% CI bound |
| `ci_upper` | float | Upper 95% CI bound |
| `theta_hat` | ndarray | Estimated θ(x) for all observations |
| `psi` | ndarray | Influence function values |
| `diagnostics` | dict | Training diagnostics |

---

## `inference()` - New Flexible API

The new API with flexible targets and automatic regime detection.

### Signature

```python
from deep_inference import inference

result = inference(
    Y,                      # (n,) outcomes
    T,                      # (n,) treatments
    X,                      # (n, d) covariates
    # Model specification (choose one):
    model='logit',          # Built-in: 'linear', 'logit', 'multinomial_logit'
    loss=None,              # OR custom loss function
    theta_dim=None,         # Required if custom loss
    # Target specification (choose one):
    target='beta',          # Built-in: 'beta', 'ame'
    target_fn=None,         # OR custom target function
    t_tilde=None,           # Evaluation point (default: mean(T))
    # Regime settings:
    is_randomized=False,    # True for RCTs
    treatment_dist=None,    # Known F_T (e.g., Normal(0, 1))
    lambda_method=None,     # Override auto-detection
    # Cross-fitting:
    n_folds=50,
    # Network:
    hidden_dims=[64, 32],
    epochs=100,
    lr=0.01,
    ridge=1e-4,
    verbose=False
)
```

### Built-in Targets

| Target | Formula | Use Case |
|--------|---------|----------|
| `'beta'` | E[β(X)] | Average treatment effect (log-odds for logit) |
| `'ame'` | E[p(1-p)β] | Average marginal effect (probability scale) |

### Custom Target Functions

Define any target h(x, θ, t̃) and the Jacobian is computed via autodiff:

```python
import torch

def my_target(x, theta, t_tilde):
    """Average prediction at treatment level t_tilde."""
    alpha, beta = theta[0], theta[1]
    return torch.sigmoid(alpha + beta * t_tilde)

result = inference(
    Y, T, X,
    model='logit',
    target_fn=my_target,
    t_tilde=0.0
)
```

### Three Regimes

| Regime | Condition | Lambda Method | Cross-Fitting |
|--------|-----------|---------------|---------------|
| **A** | RCT + known F_T | Compute (MC integration) | 2-way |
| **B** | Linear model | Analytic (closed-form) | 2-way |
| **C** | Observational + nonlinear | Estimate (neural net) | 3-way |

```python
from deep_inference.lambda_.compute import Normal

# Regime A: Randomized experiment
result = inference(
    Y, T, X,
    model='logit',
    target='beta',
    is_randomized=True,
    treatment_dist=Normal(mean=0.0, std=1.0)
)
print(f"Regime: {result.diagnostics['regime']}")  # 'A'
```

### InferenceResult Object

| Attribute | Type | Description |
|-----------|------|-------------|
| `mu_hat` | float | Point estimate |
| `se` | float | Standard error |
| `ci_lower` | float | Lower 95% CI |
| `ci_upper` | float | Upper 95% CI |
| `psi_values` | Tensor | Influence function values |
| `theta_hat` | Tensor | Estimated θ(x) |
| `diagnostics` | dict | Regime, lambda method, etc. |

---

## Configuration Guidelines

### Network Architecture

| Sample Size | Recommended |
|-------------|-------------|
| n < 1,000 | `[32, 16]` |
| 1,000 - 10,000 | `[64, 32]` |
| 10,000 - 100,000 | `[128, 64, 32]` |
| n > 100,000 | `[256, 128, 64]` |

### Cross-Fitting Folds

| Use Case | K |
|----------|---|
| Quick exploration | 10-20 |
| Production | 50 |
| Very large data | 20-50 |

### Lambda Method (for `structural_dml`)

| Method | Coverage | Default | Notes |
|--------|----------|---------|-------|
| `'ridge'` | 96% | **Yes** | Safe default, validated coverage |
| `'aggregate'` | 95% | No | Ignores X-dependence |
| `'lgbm'` | 96% | No | High accuracy alternative |
| `'mlp'` | 67% | No | **AVOID** - invalid SEs |

---

## Algorithm Overview

### Cross-Fitting (K-Fold)

```
For k = 1 to K:
    Train: Fit θ̂(x) on folds ≠ k
    [If 3-way: Fit Λ̂(x) on separate fold]
    Eval: Compute ψ on fold k

Aggregate: μ̂ = mean(ψ), SE = std(ψ)/√n
```

### Influence Function

The influence function corrects for regularization bias:

```
ψ(z) = H(θ̂) - H_θ · Λ(x)⁻¹ · ℓ_θ(z, θ̂)
```

Where:
- H(θ) = target functional (e.g., E[β])
- H_θ = Jacobian of target w.r.t. θ
- Λ(x) = E[ℓ_θθ | X=x] = conditional Hessian
- ℓ_θ = score (gradient of loss)

---

## Expected Performance

| Method | Coverage | SE Ratio |
|--------|----------|----------|
| Naive | ~10-30% | << 1 |
| **Influence** | **~95%** | **~1.0** |