Weibull Model Tutorial#

The Weibull model is for survival analysis and time-to-event data.

When to Use#

Use the Weibull model when:

  • Outcome is a positive duration or time-to-event

  • Hazard rate changes over time (increasing or decreasing)

  • Examples: equipment failure times, customer churn, patient survival, subscription duration

Mathematical Setup#

Data Generating Process#

\[Y \sim \text{Weibull}(k, \lambda)\]

where: $\(\lambda = \exp(\alpha(X) + \beta(X) \cdot T)\)$

And:

  • \(k\) is the shape parameter (controls hazard shape)

  • \(\lambda\) is the scale parameter

  • \(E[Y] = \lambda \Gamma(1 + 1/k)\)

  • \(k < 1\): decreasing hazard (early failures)

  • \(k = 1\): exponential (constant hazard)

  • \(k > 1\): increasing hazard (wear-out)

Estimand#

\[\mu^* = E[\beta(X)]\]

The average effect on log-scale parameter across the covariate distribution.

Loss Function#

\[L(Y, T, \theta) = k \log(\lambda) - (k-1)\log(Y) + (Y/\lambda)^k\]

Weibull negative log-likelihood (up to constants).

Influence Score Components#

Component

Formula

Scale \(\lambda\)

\(\exp(\alpha + \beta T)\)

\(z\)

\((Y/\lambda)^k\)

Hessian weight \(W\)

\(k^2 \cdot z\)

Score \(\nabla\ell\)

\(k(1 - z) \cdot [1, T]\)

Note: The Hessian depends on \(\theta\) through \(\lambda = \exp(\alpha + \beta T)\).

Complete Example#

import numpy as np
from deep_inference import structural_dml

# Generate synthetic data
np.random.seed(42)
n = 2000
X = np.random.randn(n, 10)
T = np.random.randn(n)

# True parameters
alpha_true = 3.0 + 0.3 * X[:, 0]
beta_true = 0.5 + 0.2 * X[:, 0]  # Heterogeneous effect
mu_true = beta_true.mean()
shape = 2.0  # Increasing hazard

# Generate Weibull outcomes
scale = np.exp(alpha_true + beta_true * T)
Y = np.random.weibull(shape, size=n) * scale

print(f"True mu* = {mu_true:.6f}")

# Run inference
result = structural_dml(
    Y=Y, T=T, X=X,
    family='weibull',
    hidden_dims=[64, 32],
    epochs=100,
    n_folds=50,
    lr=0.01
)

print(result.summary())

Expected Results#

From Eval 01: Parameter Recovery:

Family

Corr(α)

Corr(β)

Status

weibull

0.993

0.986

PASS

The influence function correction produces valid confidence intervals. See Validation for full results.

Real-World Applications#

Equipment Reliability#

Estimate how maintenance affects failure times:

# Y = time to failure (hours)
# T = maintenance frequency
# X = (equipment age, usage intensity, ...)
# Target: E[beta(X)] = average effect on log-lifetime

result = structural_dml(Y, T, X, family='weibull')

Customer Churn#

Estimate effect of engagement on subscription duration:

# Y = subscription duration (months)
# T = engagement score
# X = (demographics, plan type, ...)
# Target: E[beta(X)] = average effect on log-duration

result = structural_dml(Y, T, X, family='weibull')

Clinical Trials#

Estimate treatment effect on survival time:

# Y = survival time (days)
# T = treatment indicator
# X = (age, disease stage, biomarkers, ...)
# Target: E[beta(X)] = average effect on log-survival

result = structural_dml(Y, T, X, family='weibull')

Key Takeaways#

  1. Time-to-event data: Weibull is the standard for survival analysis with parametric hazard

  2. Shape parameter matters:

    • \(k < 1\): infant mortality (decreasing hazard)

    • \(k = 1\): constant hazard (exponential)

    • \(k > 1\): wear-out (increasing hazard)

  3. Log-link interpretation: \(\beta\) affects log-scale, so \(\exp(\beta)\) is the hazard ratio

  4. Hessian depends on theta: Requires three-way splitting