Exercise & Mental Health: Addressing Selection Bias
Using IPW to isolate the causal effect of regular exercise on mental health, controlling for the self-selection of healthier individuals into exercise programs.
Key Finding: Regular exercise reduces poor mental health days by 0.46 days/month after correcting for selection bias
(Naive estimate: 1.66 days — 258% overestimation due to healthier people self-selecting into exercise)
The Selection Bias Challenge
Naive Analysis Problem
Simply comparing exercisers vs non-exercisers is biased because:
- •People with better baseline mental health are more likely to exercise
- •Higher income/education correlates with both exercise and mental health
- •Chronic conditions prevent exercise and worsen mental health
IPW Solution
Inverse Probability Weighting creates a pseudo-population where:
- ✓Exercise is "as-if randomized" given observed characteristics
- ✓Confounders are balanced between treatment groups
- ✓We can estimate the true causal effect
How IPW Works
Estimate Propensity
Model probability of exercising based on demographics, health status, income
Calculate Weights
Weight = 1/propensity for treated, 1/(1-propensity) for control
Weighted Outcome
Compare weighted mental health days between groups for causal effect
The IPW Magic:
IPW essentially creates a "pseudo-randomized" experiment from observational data. People unlikely to exercise (but who do) get higher weights, while those likely to exercise get lower weights. This rebalancing removes the selection bias - it's like retroactively randomizing who exercises, allowing us to estimate what would happen if we could randomly assign exercise to people.
Analysis Results
Covariate Balance Achievement
Before Weighting
After IPW
✓ All standardized mean differences < 0.1 after weighting
Why Covariate Balance Matters:
Before IPW, exercisers were younger, wealthier, more educated, and had fewer chronic conditions - making any comparison unfair. After IPW, both groups have similar characteristics (like a randomized trial), allowing us to isolate exercise's true effect. The checkmarks (✓) indicate successful balancing - the foundation of credible causal inference.

What These Charts Show:
Left: Propensity Score Distribution
This histogram shows the probability of exercising for each person based on their characteristics. The overlap between teal (exercisers) and coral (non-exercisers) is crucial - it means we can find comparable people in both groups.
✓ Good overlap = Valid causal comparison possible
Right: Treatment Effect Comparison
The naive estimate (1.66 days) simply compares averages between groups, but it's biased because healthier people are more likely to exercise. The IPW estimate (0.46 days) corrects for this selection bias by reweighting the data to create comparable groups.
✓ 72% reduction in effect size after bias correction
Naive Estimate
-1.66 days
Simple difference in means
IPW Estimate
-0.46 days
Causal effect (95% CI: -0.51, -0.42)
Selection Bias
258%
Overestimation without IPW
Real-World Impact:
The true benefit of exercise (0.46 fewer poor mental health days/month) is meaningful but modest - equivalent to about 5.5 better days per year. The naive estimate would have led us to promise 20 better days per year, setting unrealistic expectations. This highlights why rigorous causal analysis matters for evidence-based policy.
Heterogeneous Treatment Effects
By Age Group
By Baseline Mental Health
Key insight: Exercise has larger benefits for those with worse baseline mental health
What This Tells Us:
The heterogeneous effects reveal that exercise isn't a one-size-fits-all intervention:
- • Age effect: Older adults (55+) see the greatest benefit (-3.8 days), possibly due to social interaction and structure that exercise provides
- • Baseline effect: Those with severe mental health challenges (16+ poor days) benefit most (-5.7 days), suggesting exercise could be particularly valuable as part of treatment
- • Policy implication: Targeted exercise interventions for high-risk groups could yield the greatest public health impact
Technical Implementation
Propensity Score Model
# Propensity score estimation
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
# Features for propensity model
features = ['age', 'income_cat', 'education',
'bmi', 'chronic_conditions', 'smoker',
'employment', 'marital_status',
'health_insurance', 'urban_rural']
# Fit propensity model
ps_model = LogisticRegression(
penalty='l2',
C=1.0,
max_iter=1000
)
ps_model.fit(X[features], treatment)
# Get propensity scores
propensity = ps_model.predict_proba(X)[:, 1]
# Calculate IPW weights
weights = np.where(
treatment == 1,
1 / propensity, # Treated
1 / (1 - propensity) # Control
)
# Stabilize weights
weights = weights * (treatment.mean() * treatment +
(1-treatment.mean()) * (1-treatment))
Weighted Outcome Analysis
# Weighted outcome regression
import statsmodels.api as sm
from statsmodels.stats.weightstats import DescrStatsW
# Check covariate balance
def check_balance(df, weights, treatment):
balanced = []
for col in features:
treated = DescrStatsW(
df[treatment==1][col],
weights[treatment==1]
)
control = DescrStatsW(
df[treatment==0][col],
weights[treatment==0]
)
smd = abs(treated.mean - control.mean) /
np.sqrt((treated.var + control.var)/2)
balanced.append(smd < 0.1)
return all(balanced)
# Estimate causal effect
model = sm.WLS(
mental_health_days,
sm.add_constant(treatment),
weights=weights
).fit()
# Get robust standard errors
results = model.get_robustcov_results(cov_type='HC3')
ate = results.params[1]
ci = results.conf_int()[1]
print(f"ATE: {ate:.2f} days")
print(f"95% CI: [{ci[0]:.2f}, {ci[1]:.2f}]")
⚠️Critical Assumptions
2. Positivity: 0 < P(Exercise|X) < 1 for all X
4. Correct model: Propensity score well-specified
Data & Methodology
BRFSS Dataset Details
Sample Characteristics
- • Years: 2019-2020 (pre-COVID baseline)
- • Sample size: 837,421 adults
- • States: All 50 states + DC
- • Response rate: 49.4% (2019)
- • Weights: Raking methodology for representativeness
Key Variables
- • Treatment: EXERANY2 (Any exercise past month)
- • Outcome: MENTHLTH (Poor mental health days)
- • Confounders: Demographics, SES, health status
- • Exclusions: Pregnant women, missing outcomes
Survey Questions Used
"During the past month, other than your regular job, did you participate in any physical activities or exercises such as running, calisthenics, golf, gardening, or walking for exercise?"
"Now thinking about your mental health, which includes stress, depression, and problems with emotions, for how many days during the past 30 days was your mental health not good?"
Policy Implications
Why These Numbers Matter
Our IPW analysis reveals that exercise reduces poor mental health days by 0.46 days per month - a modest but meaningful effect. While this may seem small, the population-level impact is substantial:
160M
US adults who don't exercise regularly
73.6M
Fewer poor mental health days annually if all exercised
$8.8B
Potential healthcare savings per year
Workplace Programs
0.46 fewer poor mental health days translates to ~5.5 more productive days/year
ROI Calculation:
Cost: $600/employee/year (gym subsidy)
Benefit: 5.5 productive days × $200/day = $1,100
Healthcare savings: $828/year
Total return: $1,928 / $600 = 3.2x ROI
Conservative 3:1 ROI even with full gym subsidy
Healthcare Savings
Each poor mental health day correlates with $150 in healthcare costs
0.46 days × 12 months × $150 =
$828/person/year saved
Plus reduced medication costs
Targeted Interventions
Greatest impact on those who need it most:
Poor baseline mental health:
5.7 day reduction (12x larger effect)
Focus resources on high-risk groups
Evidence-Based Policy Recommendations
Based on our causal analysis showing a true effect of 0.46 days (not the inflated 1.66 days), policies should be designed with realistic expectations:
1. Insurance & Healthcare Integration
- • Cover gym memberships as preventive care (saves $828/person/year in healthcare costs)
- • Integrate exercise prescriptions into mental health treatment plans
- • Require mental health providers to assess physical activity levels
Evidence: Our analysis shows exercise is most effective for those with poor baseline mental health
2. Workplace Wellness Programs
- • Mandatory 15-minute exercise breaks (like smoke breaks) - Cost: $0
- • Walking meetings and standing desks - Cost: Minimal
- • Subsidized group fitness classes - Cost: $200/employee/year (10x ROI)
- • Full gym partnerships - Cost: $600/employee/year (3x ROI)
Evidence: Even free interventions (walking breaks) can capture much of the benefit
3. Targeted Community Programs
- • Free exercise programs for adults 55+ (who benefit most: -3.8 days)
- • Mobile fitness units in low-income neighborhoods
- • Exercise buddy programs for those with depression/anxiety
Evidence: Our heterogeneous effects show 12x larger benefits for high-risk groups
4. Setting Realistic Expectations
- • Market programs honestly: "5-6 better days per year" not "transform your mental health"
- • Combine with other interventions (therapy, medication) for comprehensive care
- • Track and report actual outcomes, not just participation
Evidence: Selection bias led to 258% overestimation - honesty builds trust and adherence
Study Limitations
Methodological Constraints
- •Unmeasured confounding: Genetics, personality traits, social support not captured
- •Exercise measurement: Binary yes/no lacks intensity, frequency, duration details
- •Self-reported data: Both exercise and mental health subject to reporting bias
Future Research Needs
- →Dose-response: Optimal exercise frequency and intensity for mental health
- →Longitudinal analysis: Track individuals over time to strengthen causal claims
- →Mechanism study: Biological pathways linking exercise to mental health
Despite limitations, IPW provides more credible causal estimates than naive comparisons, revealing that ~72% of the observed association is due to selection bias rather than true causal effect.
Explore the Full Analysis
Dive into the complete code, sensitivity analyses, and detailed methodology on GitHub.