🚨 New article in
@World_Pol
🚨
Taxes In the Time of Revolution: An Experimental Test of the Rentier State during Algeria's Hirak
tl; dr: we learned how Algerian protesters responded to info about regime subsidies--
good for the rich, bad for the poor.
Let me share with you something that will change your life--Tikzit, a point and click interface to making professional-looking Tikz diagrams for use in Latex/Rmarkdown (causal diagrams, whatever you need). Link:
#SocSciResearch
#openscience
#EconTwitter
ChatGPT could be useful for research on gender. I used the same prompt--write an email from a professor for the first day of class--and varied the gender by changing "his" to "her." The "her" email is twice as long and invites students to office hours (the "his" email does not).
Wow. The beta version of
@rstudio
has a "visual editing" mode for Rmarkdown files. Basically converts Rstudio to a Word-like editor that makes it easy to do Markdown formatting/insert images/equations/bunch of other stuff. Download:
#rstats
Econometrics is this really confusing mix of "your standard errors could be off asymptotically, use this non-parametric clustering adjustment algorithm" and "it's fine if the data are binary, just use OLS, no one cares about the errors anyway."
#polisci
#econtwitter
Did you know R & Stata obscure a critical flaw in one of the most popular panel data models? Read my blog post about my new article
@PLOSONE
to discover why panel data models are often misused & misinterpreted. Blog: Article
#rstats
Pls don't retaliate for this. Reviewer comments are very noisy; unless the journal gives an R&R, it often doesn't make sense to take all (or even any) comments into consideration--especially for junior scholars/grad students who need to publish before the end of the century.
Received an invitation to review a manuscript I rejected for another journal where I gave tons of in-depth comments. The authors have taken exactly 0 of my remarks into account. Should I reject and send the same remarks? Decline to review (because of higher-order reasons)? Other?
I have worked in polling in the Middle East for ten years now. There is no way you can know how many Palestinians do or do not support Hamas, and even defining what support means is very difficult.
People without choices face tough situations that defy easy categorization.
Serious question: why are economists still using Stata? My sense is that political scientists have largely moved on to R or Python. When I was a grad student, much if not most political science research was Stata.
This is pretty cool -
@overleaf
has a free, open source Latex template for responding to reviewer's comments using fancy boxes 😎🕺.
Link:
#econtwitter
#polisci
After 3 years, so glad to be back in Tunis! But I am reminded that the amazing Tunisian soda, Boga, can't be sold abroad because Boga produces Coke products in Tunisia & they had to sign a non-compete.
@CocaCola
, we need to
#FreeBoga
! Everyone should get the chance to drink it.
I find it really offensive that people are suggesting I am/should be p-hacking this result to get just under p<0.05.
When I p-hack, I don't screw around. I always get to p<0.001. Don't insult my intelligence.
Harvard Dataverse was a great idea in its time, but it no longer is a good option for true replication of code/data. It has no file structure, difficulty cloning, & no commit history. Journals need to switch to
@github
or equivalent. (Also we shouldn't let one uni host all data).
Bittersweet announcement: We are returning to the US this coming May as I will start as an assistant prof. of poli sci at the University of South Carolina.
Sad to leave behind my brilliant & diverse colleagues & students here yet also excited for new adventures in SC! (1/2)
🚨 Ordered beta regression
#rstats
🚨
My paper on bounded variables (survey sliders, indexes, dose/response), has been conditionally accepted
@polanalysis
and R package ordbetareg is on CRAN (). Thread on why we need this model: (1/7)
#polisci
#econtwitter
🚨 Happy that my ordered beta regression article is now online
@polanalysis
! 🚨
Ordbetareg is a drop-in replacement for OLS when modeling bounded continuous outcomes--such as proportions, indexes, sliders.
#rstats
📦:
Paper link:
Here's a 🌶️ take:
A big reason for RDD's popularity is that it tends to understate uncertainty by using local linear regression. Note in the article 👇that power analysis has only been proposed in the past few years--few if any existing studies had estimated power. A 🧵 1/
Hi
#EconTwitter
!
Interested in 𝐜𝐚𝐮𝐬𝐚𝐥 𝐢𝐧𝐟𝐞𝐫𝐞𝐧𝐜𝐞 and program evaluation using the
#econometrics
of 𝐫𝐞𝐠𝐫𝐞𝐬𝐬𝐢𝐨𝐧 𝐝𝐢𝐬𝐜𝐨𝐧𝐭𝐢𝐧𝐮𝐢𝐭𝐲 𝐝𝐞𝐬𝐢𝐠𝐧?
Check out this cool survey by the top experts Matias Catteneo and Rocìo Titiunuk (
@Princeton
)!
Just an FYI if you're considering applying, the last person who held the job supported the 2020 insurrection. 👇 So if you haven't taken up arms against the U.S. govt., you may be at a disadvantage. Still may be worth a shot, of course.
They’re students. Not everything they say is fully thought out. We need to stop the micro parsing of everything they say and using them as a prop for national political debates.
The real decisions about Israel-Palestine are being made in Tel Aviv and Washington, not Columbia.
Reporter grills Columbia student after she demands the university help feed protestors occupying Hamilton Hall:
"It seems like you're saying, 'we want to be revolutionaries, we want to take over this building, now would you please bring us some food'."
Unsolicited advice for grad students:
Don't just read the scholars in your field about methods. Read across the social (or even physical) sciences. It's like learning to speak a new language--you can see the same problem in a vastly different way (and maybe a solution!).
Why should you avoid dichotomizing variables? *Because you lose power.* The plot below is from a pre-reg for an upcoming study showing varying treatment effect sizes with a four-category ordinal outcome vs. a binary outcome. Power lower by ~10pp on average.
#rstats
#MedTwitter
👇Perhaps the best piece *ever* written about scientific fraud & a fascinating character study of Gino and Ariely. Compared to the recent lightweight NYT article, this longform goes far into the dark underbelly of academic "research."
academicsky polisky
I sent a research note (NB: research note) for review to
@apsrjournal
and this is one of the comments I got. I'm sharing because research notes are clearly still not a format widely understood by political science reviewers.
It's amazing to me that thanks to
@VincentAB
R know has marginal effect and model summaries that are more powerful than Stata - a corporation with a serious revenue base. This has taken away one of Stata's big advantages--R is frankly looking better each year.
I'm developing a platform that can recruit respondents via Meta ads, take a Qualtrics survey, screen out duplicates, reimburse, and follow-up with WhatsApp.
If you'd like to be a beta customer of the platform, please let me know--will be looking at making it more open next year
Dear grad students -- I know it may not feel this way, but people really do want to hear about your research. You're bringing energy + fresh ideas + hard work. Please tell us about it, and apologies for the lack of professionalism in academia which at times suggest otherwise.
🚨 New blog post:
Lost in Transformation: The Horror and Wonder of Logit
I use the analogy of 🗺️ ↔️ 🌎 to clarify what logit is & why it's the best model for binary outcomes (0 or 1).
tl; dr: OLS is to flat earth as logit is to globes.
#rstats
A modest proposal:
Don't teach undergrads regression analysis until after they are comfortable producing and interpreting a wide variety of plots of their data (including confounding relationships!).
#rstats
#econtwitter
#polisci
#datascience
Re-analysis shows that the choice of command and fixed effects are crucial for the paper’s results. The results disappear when we drop the year dummies, the month dummies, both these dummies, or even when we change the *order* of the fixed effects in the regression command. 7/11
Of all the "canonical" link functions, logit* is by far the worst if you actually want to understand your results (infinitely so if you have an interaction term in your model)...
#statstwitter
(* Yes, probit is just as bad)
I feel like dropping a few more followers, so:
Parallel trends is a *counterfactual.* You can't observe it. Whether there are pretrends or not is technically irrelevant. You need to make a substantive argument as to what pretrends signify *in the context of the research Q.*
Do you think the causal/descriptive distinction is arbitrary? Then you'll ♥️ my paper 👇. I explain why *any* research design (👏qual too👏) can contribute causal knowledge--even /w out causal identification--by looking at entropy of causal graphs.
Link:
🚨 Idealstan v1.0 beta is now available as an R package on Github! 🚨
I've been working on this for years to make measurement models more flexible + powerful ➡️ latent predictors + time-varying + mixed responses + selection bias. Link:
#rstats
THREAD :
My favorite part of peer review is when the process takes so long that reviewers can cite works that were written and released while you were waiting to get the review back.
A lot of the angst about p-values has died down, but I have a new proposal for how to make stats reporting better:
*Ban point estimates*
Point estimates falsely communicate certainty. Instead, report CIs + whatever authors want (p-values, etc).
Intervals = uncertainty.
A key result of my ordbetareg paper that applies to *all* bounded response modeling:
*Don't transform the outcome to avoid 0s and 1s by subtracting/dividing by a small constant!*
The results change--dramatically--see "transformed beta" in my replication plot👇 (1/2)
#rstats
@davidshor
@owasow
As a political scientist, I find it horrific you were blamed for sharing research about protests. We need this kind of research to understand what protests will or won't accomplish.
🚨Blog post🚨
The Causal Representation of Panel Data: A Comment on
@xuyiqing
(2022)
How to think of fixed/varying intercepts in a causal framework/DAG? I put forward intuitive ideas to simplify new causal panel models.
#RStats
#EconTwitter
#PoliSciTwitter
No it doesn’t matter if you use R, Python or whatever.
At bottom this is a specification issue. There is no “right” way to estimate the model because the model is posing a nonsensical question. If you think there is a way to estimate it, you’re going to repeat the mistake.
I just used
@VincentAB
#rstats
📦 tinytable. It's great--a well-designed replacement for the bug-ridden kableExtra. Gives you the functions you need without a lot of bloat & works in both html and latex/PDF, which is super important for researchers.
😇 to share our new
#rstats
📦 for accessing
@CoronaNet_org
100k+ manually coded COVID-19 policy records *and* our 180+ country policy intensity scores (Jan 2020 - May 2021).
CRAN link:
Vignette:
What's in the package? (1/5)👇
If you want to support
@KhoaVuUmn
, cite👏his👏papers👏.
Obviously if it's relevant to your research--but clearly his work is relevant to a lot of people (political economy, development, E Asia, etc).
Clearer terms to use for these are "no pooling" and "partial pooling" following Gelman's terminology. No pooling means the intercepts have no relationship to each other (fixed effects); partial pooling means the intercepts are drawn from a common distribution (random effects).
Literal conversation I had at work:
Them: fixed effect
Me: yea, we ca that a random effect
“Chat gpt calls a random effect a …”
“Yea I know, it’s borked. Econ and biostatistics are different breeds”
This month:
-book out with
@CambridgeUP
- article on pol. connections & investment with
@HaillieLee
/ A. Tomashevskiy cond. accepted
@cps_journal
()
- paper /w Helen Milner on the Algerian rentier state cond. accepted
@World_Pol
()
Update on this - it's super slick. If you're using Rstudio projects, just one command `softbib()` crawls your project space and kicks out nice latex/rmd/doc files for your software bibliography. Strong
#rstats
recommend.
This is the bad news. I think we can have causal learning from observational work, but we need to abandon the paradigm of "make the regression just like an experiment" and instead focus on partial causal learning, i.e. sensitivity analysis, causal graph modeling, indirect FX, etc
My secret superpower is that I am a methodologist whenever someone asks me a difficult substantive question and an empirical researcher whenever someone asks me a difficult methods question.
My 🌶️
#rstats
take for the day:
Y'all should spend a lot less time learning the latest DiD estimator and a lot more time on sensitivity analysis and causal graphs if the aim is to make the most credible causal statements with your observational data.
We spend years teaching people an archaic yet powerful typesetting system (Latex), then pay journals to redo that work. If we had a standard formatting system based on pandoc, we could eliminate a lot of $$$ on contractors and publish much faster.
#openscience
Life at
@NYUAbuDhabi
is drinking cortado (introduced to me by a Spanish colleague) while sitting beside the Persian Gulf during a 70-degree day in January and listening to bluegrass.
Holy crap. I further asked chatGPT to write an R simulation for this 4-parameter Beta family along with Stan code to test for parameter recovery. It did and the R/stan code works. chatGPT successfully defined its own PDF for Stan and tested it to show it works. 🤯
There are two big limits on the performance of LLMs from the basic laws of statistics:
1) marginally declining utility of sample size (1/sqrt(N))
2) bias-variance trade-off
There isn't (really) anything new under the sun.
How do these laws matter for the future of LLMs?
(1/)
*A priori*, there's no reason to think that difference-in-differences (however estimated) is more likely to be causally identified than any other panel data model.
Meet the new editorial team for the American Political Science Review [
@apsrjournal
], the oldest and most prestigious political science journal in the world.
#APSR
My other causal inference
#protip
: when using any causal design (RCT/DiD/RDD), explain the inference problem--why are we concerned about naive comparisons?
*All* of these techniques have trade-offs, and knowing why we are using them can help make that trade-off clear. (1/2)
I say it's largely a myth that academic jobs offer greater intellectual or time "freedom" compared to jobs outside the academy, all things considered.
A myth that keeps PhDs from truly exploring their options and sometimes stuck in shitty situations.
What say you all?
In honor of reaching 3k followers (y’all have some seriously bad taste), here is an ultra-niche meme guaranteed to get those follower numbers down to reasonable levels.
@ProfessaJay
In political science we also face the problem that students expect it to be easy. They expect bio and physics to be hard but poli sci is a cake walk where you talk about your dreams of becoming a Supreme Court justice.
While on the market, I hand delivered my applications by locating the chair’s address in the voter file. I made sure to arrive early AM when I knew they would be home.
Did it get me fly outs? No. But did anyone think I was unserious? No, and I have the arrest record to prove it.
#Econjobmarket
— I am really surprised by the significant percentage of cover letters that are completely canned (& even misname the job posting institution etc.) It takes very little effort to tailor a CL to a specific job posting/institution. Given the sheer volume of
I got the Qualtrics API set up so I can pull my survey directly into an Rmarkdown file. The result is I can refresh every hour when I get new data and see if the estimates change. It's like Twitter but with p values.
My simplest and best
#rstats
tip: *don't* use ln (natural log) or log base 10 in regression models. Use log base 1.01 or log base 1.1. If you do, then your coefficient can be interpreted as the effect of a 1% increase (base 1.01) or a 10% increase (base 1.1) in the predictor.
Interesting factoid about confounders: everyone assumes that an "omitted confounder" would eliminate the observed association if included. But a priori, it is just as likely including a confounder would *increase* the observed effect size as it would decrease it.
#rstats
#causal
The {marginaleffects} 📦 book is now online! 25 chapters on post-estimation analyses and interpretation with
#Rstats
. The 📖 is full of tutorials, case studies, tips, and technical notes. Please check it out and let us know how we can improve this resource
You need to switch from Latex to Rmarkdown/Quarto. Why? Because you can compile directly to Word. Why do you need to compile directly to Word? So that you can upload your file (for free!) to Grammarly and copy-edit the draft before you send it out. 🙌
I have considered doing this kind of replication for TWFE papers in general and haven’t yet, but please do be careful when you’re using a lot of dummies! Strange things may be going on under the hood…
Banning the purchase of sex 🚨DOES NOT🚨increase cases of reported rape.
A re-analysis of Ciacci (2024) shows that the paper's headline result comes from an erroneous use of Stata's regression command.
A thread from
@Jopieboy
,
@OlleFolke
, and me 1/11
Regression is a tool for making comparisons
If you don't know / can't easily explain what comparisons you're trying to make, then you don't understand the regression you're running
If you are doing fraud for a paper, be sure to make it reproducible—as this paper does with an elegant for loop in Stata👇.
Enough of this making up data in Excel with Calibri crap. This is the 21st century. DO BETTER
This paper is a helpful additional perspective after the Stanford team showed little change across panel estimators/new DiD. Here there are significant differences and uncertainty about the correct specification.
My concern with this literature is that (1/)
How does gun violence impact electoral change?
@hjghassell
&
@johnholbein1
show that studies finding a positive effect of gun violence on Democratic vote shares are a product of failing to properly specify difference-in-differences models.
#APSRFirstView
.
One of the fun things about
#apsa2022
is how, compared to 4 years ago, all the people I know are publishing well, feel accepted by their peers and have positive relationships with the powers that be.
🚨New results from online survey in Tunisia:
I tested whether people are afraid to report opposition to Kais Saied. Turns out this is true--at least 20% (!!) of the population does not support him but will not report it on surveys. Opposition to him is likely 50% or even higher.
I found a quick & effective solution to the PDF/Latex -> Word conversion problem \w
@adobe
DC (free \w uni subscription). You can do a near-perfect conversion from a *compiled* PDF to a Word doc... even ports over figures/math at very high quality.
#econtwitter
#PoliSciTwitter
Suppose I were to write a hands-on guide to using new tools (DeclareDesign/CausalQueries/Quarto) to write a robust preregistration in R with a power analysis & all the fixins. Where could I publish such a thing?
@tcarpenter216
R is a utopia for statistics nerds. It need not be more or less.
R is living its best life in its beautiful corner of the universe and Python (who hasn't learned to say "no" to requests for extra tasks) is secretly jealous.
It’s so comforting as a social scientist watching physicists examine
#LK99
and not know whether the data are wrong, the sample is crap, the theory is misguided or the measurements are misleading.
The struggle is real.
In case you are wondering, "so what is the difference??"--there is no mathematical way to state which is correct. These pictures are different *philosophical* statements. Do we aim for a single "true" effect? Or are causal factors more like bundles of related (random) things?
This figure is under-powered. You can see the auto-correlation but the author is using separate regressions for each time point.
Statistical methods would be:
1) collapse everything to 2 time points and do a pre/post comparison OR
2) fit a time function (spline, AR(1), etc)
@andre_quentin
it's hard to tell if you are suggesting that this is p-hacked, or if you just mean that the results are uncertain
seems like careful, interesting work with uncertain results to me! so many plots of CIs! big fan! uncertainty intervals also include large effects!
Checking the prior predictive distribution in a Bayesian model is even more important than checking the posterior predictive distribution.
My spline model wasn't fitting so I sampled the priors, and lo and behold, my priors allow for ideal points between 1x10^5 and -1x10^5 🙃.
Ok I’m done posting about APSA. Let’s just try to give people some respect and space to make a difficult decision that may in part depend on factors we can’t observe. Also I think retaliation for these choices (on either side) is grossly unprofessional and unethical.
New Paper! 🎉
"Power Rules: Practical Statistical Power Calculations"
Among other things, I write about how researchers might use pilot data to inform power calculations.
The paper is an early version, but I welcome feedback.
GitHub: