Weibull Model Tutorial#
The Weibull model is for survival analysis and time-to-event data.
When to Use#
Use the Weibull model when:
Outcome is a positive duration or time-to-event
Hazard rate changes over time (increasing or decreasing)
Examples: equipment failure times, customer churn, patient survival, subscription duration
Mathematical Setup#
Data Generating Process#
where: $\(\lambda = \exp(\alpha(X) + \beta(X) \cdot T)\)$
And:
\(k\) is the shape parameter (controls hazard shape)
\(\lambda\) is the scale parameter
\(E[Y] = \lambda \Gamma(1 + 1/k)\)
\(k < 1\): decreasing hazard (early failures)
\(k = 1\): exponential (constant hazard)
\(k > 1\): increasing hazard (wear-out)
Estimand#
The average effect on log-scale parameter across the covariate distribution.
Loss Function#
Weibull negative log-likelihood (up to constants).
Influence Score Components#
Component |
Formula |
|---|---|
Scale \(\lambda\) |
\(\exp(\alpha + \beta T)\) |
\(z\) |
\((Y/\lambda)^k\) |
Hessian weight \(W\) |
\(k^2 \cdot z\) |
Score \(\nabla\ell\) |
\(k(1 - z) \cdot [1, T]\) |
Note: The Hessian depends on \(\theta\) through \(\lambda = \exp(\alpha + \beta T)\).
Complete Example#
import numpy as np
from deep_inference import structural_dml
# Generate synthetic data
np.random.seed(42)
n = 2000
X = np.random.randn(n, 10)
T = np.random.randn(n)
# True parameters
alpha_true = 3.0 + 0.3 * X[:, 0]
beta_true = 0.5 + 0.2 * X[:, 0] # Heterogeneous effect
mu_true = beta_true.mean()
shape = 2.0 # Increasing hazard
# Generate Weibull outcomes
scale = np.exp(alpha_true + beta_true * T)
Y = np.random.weibull(shape, size=n) * scale
print(f"True mu* = {mu_true:.6f}")
# Run inference
result = structural_dml(
Y=Y, T=T, X=X,
family='weibull',
hidden_dims=[64, 32],
epochs=100,
n_folds=50,
lr=0.01
)
print(result.summary())
Expected Results#
From Eval 01: Parameter Recovery:
Family |
Corr(α) |
Corr(β) |
Status |
|---|---|---|---|
weibull |
0.993 |
0.986 |
PASS |
The influence function correction produces valid confidence intervals. See Validation for full results.
Real-World Applications#
Equipment Reliability#
Estimate how maintenance affects failure times:
# Y = time to failure (hours)
# T = maintenance frequency
# X = (equipment age, usage intensity, ...)
# Target: E[beta(X)] = average effect on log-lifetime
result = structural_dml(Y, T, X, family='weibull')
Customer Churn#
Estimate effect of engagement on subscription duration:
# Y = subscription duration (months)
# T = engagement score
# X = (demographics, plan type, ...)
# Target: E[beta(X)] = average effect on log-duration
result = structural_dml(Y, T, X, family='weibull')
Clinical Trials#
Estimate treatment effect on survival time:
# Y = survival time (days)
# T = treatment indicator
# X = (age, disease stage, biomarkers, ...)
# Target: E[beta(X)] = average effect on log-survival
result = structural_dml(Y, T, X, family='weibull')
Key Takeaways#
Time-to-event data: Weibull is the standard for survival analysis with parametric hazard
Shape parameter matters:
\(k < 1\): infant mortality (decreasing hazard)
\(k = 1\): constant hazard (exponential)
\(k > 1\): wear-out (increasing hazard)
Log-link interpretation: \(\beta\) affects log-scale, so \(\exp(\beta)\) is the hazard ratio
Hessian depends on theta: Requires three-way splitting