📈 I taught neural networks this week, so BEHOLD, an ode to
@daniela_witten
in the form of a shiny application:
"it's just a linear model" neural nets edition
#rstats
📣Our 🆕 paper Causal Inference is Not Just a Statistics Problem is out!
@malco_barrett
,
@travisgerke
, and I show that you can have 4 data sets with identical summary stats & visuals but very different data generating mechanisms-statistics alone can't tell you what to adjust for!
🗣 Interested in conducting a sensitivity analysis for unmeasured confounding? It's easy!
Here's a quick paper with several methods depending on your goals & what information you have available with real-data examples and {tipr}
#rstats
code
🎙️ On this weeks episode we talk about a “Causal Quartet” a set of four datasets generated under different mechanisms, all with the same statistical summaries (including visualizations!) but different true causal effects
(Plus a chat about M-bias!)
📣
@malco_barrett
,
@travisgerke
, and I have been working on some causal inference in
#rstats
projects (packages, workshops, and a new blog!) and have recently collected them all in a new website 👇
New post on imputation 👀 but first, a {mice} question: I've generated a very simple missing data problem (c ➡️ x + missingness)
when I use the defaults the model post imputation is super biased! Only if I specify to fit a simple regression model does the imputation work...why?
Curious why statisticians recommend including the outcome in your imputation models? Check out our new paper in Statistical Methods in Medical Research!
@SarahLotspeich
,
@StatStaci5
, and I show with some simple mathematical derivations why this is really a requirement!
🗣 Y'ALL I just learned that you can iterate through code highlighting in Quarto slides using a |, for example, if I want to first show lines 1-5, then 6, then the whole thing I would add this option:
#| code-line-numbers: "1-5|6|"
so much copy-paste time saved!
My dear grandfather peacefully passed away last week — we already miss him so much, but I am so grateful to have this conversation with him from a few years ago recorded on
@casualinfer
@EpiEllie
& I re-released it in his memory this week ♥️
📦 We simulated a "Causal Quartet" (in the spirit of Ansombe's Quartet & others!) to demonstrate this phenomenon that you (or your students!) can play with in the {quartets}
#rstats
package
🥳So there you have it! Statistics can't solve your causal inference problems, you need to know the data generating mechanism OR having timing (mostly) can!
Check out our paper:
And R package:
And let us know what you think!
in a twist, the code below is correct!
"The ^ operator indicates crossing to the specified degree. For example (a+b+c)^2 is identical to (a+b+c)*(a+b+c) which in turn expands to a formula containing the main effects for a, b and c together with their second-order interactions"
A new study in
@PNASNews
claims to provide evidence that covid vaccine mandates reduced covid booster and flu vaccine uptake. They provide the replication code for the DiD models, but... these are not DiD models and that is not an interaction term.
The results here don't surprise me (9% correct! 🙊) BUT assuming you don't have any missing outcome data, if you have the all the right predictors your data can be missing *not* at random, and complete case analysis will still get an unbiased result. 🫠
📊 Poll! You are trying to predict some outcome -- you have all of the right predictors but some have missing data. Will doing a complete case analysis give you unbiased results?
MCAR: missing completely at random
MAR: missing at random
MNAR: missing not at random
For this month's
@AmJEpi
tweetorial, I am going to walk through
@jerudolph13
,
@eschisterman1
, and
@ashley_naimi
's excellent simulation study comparing inverse probability weighting (IPW) and G-computation in survival analysis
Inspired by an exchange with
@CausalKathy
, I look at 3 scenarios:
1️⃣ we know the imputation model
2️⃣ we know the outcome model
3️⃣ we don't know anything
If we have the model exactly right, imputation helps with precision, but can also be quite biased!
✍️ For our first R Causal blog post,
@malco_barrett
writes about the (newly on CRAN!) halfmoon package — a toolkit to assess balance in propensity-based models 🧘♀️
[Assume I have lots of data i.e. I am more concerned with bias than precision]
Can someone remind me why we do imputation instead of just complete case + conditioning in our outcome model? 👀
📣 We're hiring two tenure-track Assistant Professors! Some things I love about
@WakeForest
's Department of Statistical Sciences:
⚖️ teacher-scholar model, rewarding balance of research & teaching
🥂 🆕department with opportunities to lead & shape the direction
🤗 the people!
Join us for a chat about missing data on Friday!
I’ll start with a bit on when to include the outcome in your imputation model, and then
@f2harrell
will open a discussion about current best practices — I’m so excited!
Our
#wfustatisticalsciences
Math Business student, Lauren Walsh hit the winning shot tonight, naming
@WakeForest
the women’s golf NCAA national champs for the first time! Go Lauren, it was a delight having in the classroom and on the course! ⛳️
👏 Congratulations to
@WakeForest
’s
@StatStaci5
on the Early Career Award from the Section on Statistics in the Environment!! So well deserved!
#JSM2023
🍾 🥂
🔑 An analyst with just a dataset with three columns -- the exposure, the outcome, and the measured factor -- cannot distinguish between these 4 mechanisms illustrating the crucial role of understanding data generation processes in statistical analysis
👏 Congratulations to
@ashley___mullan
on the
@AmstatNews
Gertrude M. Cox Award! Awarded for her dedication to statistical inquiry, her passion for mentoring, and her extraordinary commitment to scholarly excellence in pursuit of her MS in statistics
@WakeForest
!
What study is the “>80% on a ventilator died” coming from? Hopefully not the JAMA one that excluded the 72% of patients who were still in the hospital from the denominator…
Correction was issued but not many news outlets picked it up. It’s an example in my stat comm class!
This is fascinating to watch because it seems they are talking past each other on the purpose of the analysis, from my read Nate says the confounder doesn’t change the direction of the average effect (true) — Martin says it changes the estimation for specific states (also true)
I'm sorry but you are not doing a good job of articulating your point. I literally don't understand what your objection is. It's not hard to track me down if you want to chat privately. But your criticisms of me have been public, and so my criticisms of you have been public, too.
@jfeldman_epi
@PNASNews
I agree this looks very weird but R actually *does* create the proper interaction with (baseline_att + before_after)^2 (much to my initial surprise)
👏 Our
@WakeForest
’s
@SarahLotspeich
giving her award winning presentation — helping us optimally pick who to audit!
#JSM2023
Sarah won the Biometrics Section Early Career Award 🥇 for this paper! 🍾 🥂
📄 Check out the paper here:
Transparency in public health messaging matters.
@hanmmendoza
& I looked at how providing transparent information about why a public health recommendation is being made can increase uptake in a randomized trial published today in
@PLOSONE
🦸♂️🕒 This may sound a bit bleak, but wait! there's a lifeline! Even if you're clueless about the data-generating mechanism, incorporating *when* data is measured can be a game changer. Why? 👇
🎙️ Excited to kick off season 5!
In this episode
@EpiEllie
& I chat about the RCTs and observational studies. My thoughts?
👉 we’ve done a good thing getting folks to trust evidence from randomized trials
👉 we’ve done a silly thing by lacking nuance
What do you think?
🎙️ And we’re back! Check out Season 5 Episode 1 on your favorite podcast app!
@EpiEllie
&
@LucyStats
chat about the pros and cons of randomized trials!
🥳 Kicking off
@WakeForest
’s first Florence Nightingale Day — a day for local middle and highschoolers to experience all the cool things they can do with stats and data science!
Was just convinced that IV is actually a really good idea by
@MariaGlymour
(for those who have followed along,
@EpiEllie
and I have been vocal IV skeptics 😅), the next episode of
@casualinfer
is going to be a good one!
Feeling incredibly fortunate to have these two brilliant women as colleagues & friends! Somehow we can have an equally fun time deriving a variance (we WILL get that last bit 🤣) or grabbing a drink 🥂
#JSM2023
(and life) are more fun with you both
@StatStaci5
&
@SarahLotspeich
!
In most* cases, in order for a variable to be a collider or mediator (so you would *not* adjust for it), it needs to occur post-exposure. Adjusting for a variable that is measured prior to exposure is fine even if the exposure would influence a future measurement of that variable
📸 Some statistics activities at
@WakeForest
Florence Nightingale Day!
📈 Data Viz Art
⚽️ “stat”-apult
✂️ Rock paper scissors tournaments
🐐 Monty Hall Simulator
Regularly scheduled reminder: You shouldn't argue that one effect is different than another by showing that one is statistically significant and the other not.
@malco_barrett
,
@travisgerke
, & I have a preprint with details:
🔗
Also the {quartets} package includes the datasets if you’d like to play with it yourselves!
🎙️In our latest episode,
@EpiEllie
and I chat about confounding! Including:
✅ Our preferred definition for a confounder
↕️ A chat about thinking about the *direction* of effects
🔗 A new review on sensitivity analyses for unmeasured confounders:
🎙️ I learned a ton from
@mark_vdlaan
in this discussion! Some things it has me thinking more about:
🕳️ The "causal gap" -- distinguishing between the causal estimand vs the statistical estimand
🤸♀️ The benefits of considering both simple parametric and flexible candidate models
🎙️ Season 5 Episode 2 (our 50th episode!) is live!
@EpiEllie
&
@LucyStats
chat with Mark van der Laan (
@mark_vdlaan
) about Targeted Learning — let us know what you think!
Ok missing data friends — Robins & Wang variance for coefficients after performing imputation ought to work for single imputation, but all of my sims are showing low coverage when the missing % is high (even with very large N) — what might I be doing wrong 🤔
Speaking of! I will be giving a webinar on our Design Principles for Data Analysis next Monday via the ASA Section on Teaching Statistics in Health Sciences! Join us!
📅 Monday Oct 30
⏰ 3:30p eastern
✅ Register:
@EmilyRiederer
@PrzeBiec
@StatModeling
This inspired me to push one I've been working for a bit to GitHub ()
4 data generating mechanisms, all with the same correlation between X and Z and same linear relationship (unadjusted) between X and Y, but totally different "correct" causal effects
Another FN Day in the books! This is one of my favorite events we do
@WakeForestStats
— a day full of statistics and data science with our local middle and high schoolers!
🥇🥈🥉 Check out our data visualization prize winners!
📣 Ok we learned about SO MANY excellent resources. Here are a few:
🎥
@EmilyHGriffith1
and co. made 10 Statistical Communication training videos:
👩💻 The Statistical Consulting Training Repository Initiative: (cc:
@ryan_peterson1
)
“It is hard to get the right answer, it’s just really hard! I think we should be using all the tools we can” — great quote from
@MariaGlymour
on this weeks episode!
This is a really good one that really changed my views on the utility of instrumental variable analyses
New post on imputation 👀 but first, a {mice} question: I've generated a very simple missing data problem (c ➡️ x + missingness)
when I use the defaults the model post imputation is super biased! Only if I specify to fit a simple regression model does the imputation work...why?
Congratulations
@BandeenKaren
!!
I love this quote from Dr. Bandeen-Roche that
@nataliexdean
pointed out earlier this week:
“Leadership takes diverse forms - many authentic selves can succeed.”
#JSM2023
*The only case where this is *not* the case is M-bias, however it has been argued that strict M-bias is very rare in most practical settings (Liu et al.
2012, Rubin 2009, Gelman 2011, Ding & Miratrix 2015).
@MartinKulldorff
@NateSilver538
I am curious about the 2nd claim, 🙋♀️ I am a biostatistician who would present mortality data that isn’t age-adjusted to describe populations. I think age-adjusted tables are tricky because they give perception of causality (we’re adjusting!) but often age alone isn’t sufficient
🗣 Interested in conducting a sensitivity analysis for unmeasured confounding? It's easy!
Here's a quick paper with several methods depending on your goals & what information you have available with real-data examples and {tipr}
#rstats
code
We are hosting our first Florence Nightingale Day on April 22, 2023 from 1:00 pm - 4:00 pm at
@WakeForest
!
This is a 🆓 STEM experience for middle and high school students in celebration of women in statistics and data science!
🎙 In which I mess up the ENAR acronym AND discuss the awesome targeted learning workshop I attended by
@mark_vdlaan
,
@nshejazi
,
@rachaelvp
,
@podTockom
!
(Aside: if any Targeted Learning experts want to join us in the future we'd love to have you on to correct our mistakes!)
Thrilled to have Lance Waller from
@EmoryBIOS
delivering our inaugural Wake Forest Distinguished Lecture in Statistics and Biostatistics!
🗺 MAPS! A statistical view
There’s a bit of a twist, though! It turns out if you’re doing *deterministic* imputation you should NOT include the outcome in the imputation model, with stochastic imputation methods you must!
📊 Poll! You are trying to predict some outcome -- you have all of the right predictors but some have missing data. Will doing a complete case analysis give you unbiased results?
MCAR: missing completely at random
MAR: missing at random
MNAR: missing not at random
#ENAR2023
@StatStaci5
talking about an integrated abundance model for estimating county-level prevalence of opioid misuse in Ohio — check out her new paper in JRSSA!
Given a single dataset with 3 variables: exposure, outcome and covariate (z) how can statistics help you decide whether to adjust for z? It can’t! The correlation between z and the exposure in all 4 datasets is 0.7!
So if Stats can’t help what can we do? Well the best thing is just to know the data generating mechanism but that is hard! An easier solution is to make sure to have time varying measurements and only adjust for pre-exposure covariates! This solves the problem in 3/4 of the sets!
@EpiEllie
@tcarpenter216
But if (1) is not true what method (other than sensitivity analysis, collecting more or different data) could save you 🤔 I feel like most propensity score hate could apply to almost any statistical method
#ENAR2023
@SarahLotspeich
teaching us how to properly impute censored covariates (cc
@drob
I feel like I saw something related to this float by my timeline from you recently!)
@rbganatra
My view is that across all medicine, only 1 field weaves in every paper - statistics
Yet it is glossed over in medical schools and almost made a mockery of
MDs generally don’t get to understand the full value of statistics in making inference
We need actual stats in MD school
Of the two, the direction of the average effect seems like what the target of interest is (which I guess makes sense since Nate wrote the original post) — disagreement between analysis producers and consumers is an area I find fascinating (and one that I think needs more focus!)
@cmyeaton
Far from solved, but I have dwindled the curation down to a single google sheet that I enter things into that auto populates my website + cv etc as needed via a bit of R e.g.:
@EpiEllie
@tcarpenter216
Ah yes that’s true, but as I love to harp on IV still needs no unmeasured confounders you just move the target from the exposure to the instrument!
Wow yes in case anyone needed a bit more convincing that jargon muddies the water and we should all just be writing down our models when trying to communicate what we're doing 😅
We kick off our most recent episode with a re-hashing of the "fixed vs random effects" conversation, sparked by an
@andrewheiss
blog post. When someone says they fit a "fixed effects model" what do you assume?
This reminds me of a convo
@EpiEllie
& I recently had on
@casualinfer
with
@travisgerke
. If you have your DAG right, the choice of statistical method may be less important! So one big takeaway from this simulation? Spend time perfecting that DAG!
/fin
🚀 Unlocking ultimate productivity mode thanks to
@seanjtaylor
's wisdom! 💡 Embracing
#OLS
for all tasks, and the results are mind-blowing! 🌟 Don't miss out on this game-changing strategy! 🔥
#ProductivityHacks
#GameChanger
#JSM2023
is right around the corner and
@WakeForest
Department of Statistical Sciences is well represented! Check out our sessions!
Eva Murphy • Sun 3:05p
@dm_kline
• Sun 4:35p
@LucyStats
• Mon 11:40a
Mayson Zhang • Mon 2:50p
@SarahLotspeich
• Tues 9:55a
👇🧵for details!
Join us to today! We’re going to chat about our Causal Quartet, data you can use to help students understand that causal inference is not just a statistics problem 👇
We are hosting a “Florence Nightingale Day” at
@WakeForest
on April 22nd from 1 -4:30p — open to middle & high school students 13 and up!
Learn more here:
At
#JSM2023
? Take a stroll at 6:30p to the Fairmont Royal York Hotel (in Confederation 5&6) to join us for the Section on Statistical Graphics & Statistical Computing mixer! Free food! 🍔🥦🍸
@jfeldman_epi
@EpiEllie
@PNASNews
It is in the docs!
The ^ operator indicates crossing to the specified degree. For example (a+b+c)^2 is identical to (a+b+c)*(a+b+c) which in turn expands to a formula containing the main effects for a, b and c together with their second-order interactions.
@casualinfer
@EpiEllie
And of course the winning cookie — Epi textbooks! Ingrid shared these were vanilla cookies with piped icing…YUM 🤤 Thank you
@SarahBAndrea
for organizing this each year!
That professor that inspired you, the roommate you miss, the advisor that had your back – let them know with a special
#Deacsgiving
message. Just quote this tweet! 🦃🦃
To see an applied example of a causal analysis using halfmoon, check out the second chapter of our (⚠️ very much a work in progress!) Causal Inference in R.
@MaartenvSmeden
Maybe something like in the context of model A, increases in X show increases in Y? Could be causal if model A has all necessary confounders / no colliders etc, could just be “correlation” if not, but knowing what else was in the model is key
🧘♀️ Continuous variables could be balanced in the mean but unbalanced in the tails — we can look at the empirical cumulative distribution function (eCDF) with geom_ecdf() to compare the differences between groups across the range of a potential confounder
@CausalKathy
@mariokeko1995
@DrJWolfson
It is generally recommended to include the outcome in the imputation model -- when that is done, the bias is definitely reduced in all cases!
The best case scenario is definitely when the true outcome model is known, so I suggest everyone should invest in some crystal balls 🔮😅