# Multimodal Tutorial: Text & Image Embeddings This tutorial demonstrates `deep-inference` with **high-dimensional embeddings** as covariates X. Modern deep learning features (BERT, ResNet, CLIP) can capture rich heterogeneity in treatment effects. --- ## Why Embeddings? Traditional econometrics uses tabular covariates (age, income, education). But rich data sources—job descriptions, product images, research abstracts—contain information that drives treatment effect heterogeneity. **This package handles high-dimensional X seamlessly:** - Feature embeddings (64+ dimensions) - PCA-reduced text/image embeddings from BERT, ResNet, CLIP - The neural network learns which dimensions drive heterogeneity in $\beta(X)$ **Note:** For very high-dimensional embeddings (384-768+), use PCA to reduce to ~64 dimensions, or ensure n/dim ratio > 50 for stable estimation. --- ## Gallery of Examples We demonstrate three model families with realistic scenarios: | Model | Outcome (Y) | Treatment (T) | Covariates (X) | |-------|-------------|---------------|----------------| | **Linear** | Log wages | Years of experience | Job embeddings (64-dim) | | **Logit** | Purchase (0/1) | Discount % | Product embeddings (64-dim) | | **Poisson** | Citation count | Open Access (0/1) | Abstract embeddings (64-dim) | --- ## Example 1: Linear — Wages with Job Embeddings **Scenario:** A labor economist studies how experience affects wages. The effect may vary by job type—captured via job description embeddings. $$Y_i = \alpha(X_i) + \beta(X_i) \cdot T_i + \varepsilon_i$$ where: - $Y$: Log hourly wage - $T$: Years of experience - $X$: 64-dim embedding of job description (e.g., PCA of BERT features) **Hypothesis:** Complex jobs have steeper experience gradients. ```python import numpy as np from deep_inference import structural_dml # X: Job description embeddings (PCA-reduced from SentenceTransformer) # T: Years of experience # Y: Log hourly wage result = structural_dml( Y=Y, T=T, X=X_embeddings, family='linear', hidden_dims=[128, 64], epochs=150, n_folds=50 ) print(result.summary()) # Analyze heterogeneity beta_hat = result.theta_hat[:, 1] # Individual-level effects print(f"Effect range: [{beta_hat.min():.3f}, {beta_hat.max():.3f}]") ``` **HTE Distribution:** ``` True β(X) Estimated β̂(X) Mean 0.030 0.043 Std Dev 0.019 0.040 Min -0.043 -0.217 Median 0.030 0.044 Max 0.103 0.166 ``` --- ## Example 2: Logit — Purchases with Image Embeddings **Scenario:** An e-commerce company studies discount effectiveness. Does a 10% discount work better for some products than others? $$P(Y_i = 1) = \sigma(\alpha(X_i) + \beta(X_i) \cdot T_i)$$ where: - $Y$: Purchase indicator (0/1) - $T$: Discount percentage - $X$: 64-dim embedding of product image (e.g., PCA of ResNet features) **Hypothesis:** "Premium-looking" products may be hurt by discounts (quality signaling), while "value" products benefit. ```python result = structural_dml( Y=Y_purchase, T=T_discount, X=X_image_embeddings, family='logit', hidden_dims=[128, 64], epochs=150, n_folds=50 ) # Who should get discounts? beta_hat = result.theta_hat[:, 1] discount_sensitive = beta_hat > np.median(beta_hat) print(f"Products to discount: {discount_sensitive.sum()} / {len(beta_hat)}") ``` **Policy Insight:** - Products where discounts **work** (high $\hat{\beta}$): Value-oriented products - Products where discounts **hurt** (low $\hat{\beta}$): Premium products --- ## Example 3: Poisson — Citations with Abstract Embeddings **Scenario:** A bibliometrics researcher studies the Open Access (OA) citation advantage. Which papers benefit most from OA? $$Y_i \sim \text{Poisson}(\exp(\alpha(X_i) + \beta(X_i) \cdot T_i))$$ where: - $Y$: Citation count - $T$: Open Access indicator (0/1) - $X$: 64-dim embedding of paper abstract (e.g., PCA of SciBERT features) **Hypothesis:** Technical papers behind paywalls benefit more from OA than already-accessible papers. ```python result = structural_dml( Y=Y_citations, T=T_open_access, X=X_abstract_embeddings, family='poisson', hidden_dims=[128, 64], epochs=150, n_folds=50 ) # Citation multiplier from OA print(result.summary()) print(f"\nCitation multiplier: {np.exp(result.mu_hat):.2f}x") # Which papers benefit most? beta_hat = result.theta_hat[:, 1] top_beneficiaries = np.argsort(beta_hat)[-100:] # Top 100 ``` **HTE Distribution:** ``` True β(X) Estimated β̂(X) Mean 0.297 0.407 Std Dev 0.415 0.350 Min -1.254 -1.532 Median 0.298 0.383 Max 1.790 1.681 Interpretation: exp(0.3) = 1.35x citation multiplier from OA ``` --- ## Using Real Embeddings Replace simulated embeddings with real ones. **For best results, use PCA to reduce high-dimensional embeddings to ~64 dimensions.** ### Text Embeddings (Sentence-Transformers) ```python from sentence_transformers import SentenceTransformer from sklearn.decomposition import PCA model = SentenceTransformer('all-MiniLM-L6-v2') X_raw = model.encode(texts) # (N, 384) numpy array # Reduce dimensions for stable estimation pca = PCA(n_components=64) X = pca.fit_transform(X_raw) # (N, 64) result = structural_dml(Y, T, X, family='linear', epochs=150, n_folds=50) ``` ### Image Embeddings (torchvision) ```python import torch from torchvision import models from sklearn.decomposition import PCA resnet = models.resnet50(pretrained=True) resnet = torch.nn.Sequential(*list(resnet.children())[:-1]) X_raw = resnet(images).squeeze().numpy() # (N, 2048) # Reduce dimensions pca = PCA(n_components=64) X = pca.fit_transform(X_raw) # (N, 64) result = structural_dml(Y, T, X, family='logit', epochs=150, n_folds=50) ``` ### Multimodal (CLIP) ```python import clip from sklearn.decomposition import PCA model, preprocess = clip.load("ViT-B/32") X_images = model.encode_image(images) X_texts = model.encode_text(texts) X_raw = torch.cat([X_images, X_texts], dim=1).numpy() # (N, 1024) # Reduce dimensions pca = PCA(n_components=64) X = pca.fit_transform(X_raw) # (N, 64) result = structural_dml(Y, T, X, family='poisson', epochs=150, n_folds=50) ``` --- ## Key Takeaways 1. **High-dimensional X works:** 64+ dim embeddings are handled seamlessly 2. **Heterogeneity captured:** The model learns which embedding dimensions drive $\beta(X)$ (correlations 0.4-0.9) 3. **Valid inference:** Influence function correction provides valid CIs for most model families 4. **Policy-relevant:** Identify *who* benefits from treatment for targeting 5. **Practical guidance:** For very high-dim embeddings, use PCA to reduce to ~64 dims, ensure n/dim > 50 --- ## Run the Full Gallery ```bash python tutorials/06_multimodal_gallery.py ``` This runs all three examples with simulated embeddings and prints: - Point estimates and confidence intervals - HTE distribution tables - ASCII histograms of $\hat{\beta}(X)$ - Policy insights for each scenario