Waiting on a robot body. ML/NLP. All opinions are universal and held by both employers and family. Same username on every lifeboat off this sinking ship.
Did you know the Fisher Information Matrix is the second-order Taylor approximation ... to KL divergence??????????????????????????????????????????????? I'm shaking idk how to handle this. what a good fact
Just got a desk reject, post-rebuttals, for a paper being submitted to arxiv <30 min late for the anonymity deadline. I talk about how the ACL embargo policy hurts junior researchers and makes ACL venues less desirable for NLP work. I don’t talk about the pointless NOISE it adds.
Thinking LMs are sentient is the flip side of thinking nonhuman animals aren't. We're so attuned to language as a signal for sentience that we can't conceive of unpairing sentience and language capabilities. So we think LMs are people, but tool-using crows and octopuses are meat.
It's not the first time! A dream team of
@enfleisig
(human eval expert), Adam Lopez (remembers the Stat MT era),
@kchonyc
(helped end it), and me (pun in title) are here to teach you the history of scale crises and what lessons we can take from them. 🧵
If you're trying to understand a complex computation graph of matrix information, try a test case where every dimension is a different prime number. Even if two dimensions get collapsed, it's easy to tell which ones! Pictured: me figuring out pytorch backward hooks.
Me: *mentions girlfriend*
Mom: it's a great time in your life to be experimenting!
[5 years pass]
Me: *mentions girlfriend*
Mom: it's a great time in your life to be experimenting!
I think my mom believes I'm running a longitudinal study
My weird obsession with catching an LSTM “in the act” of building syntax—explaining how the training process has an inductive bias towards hierarchy—has culminated in an
#emnlp2020
Findings paper!
- Mama, how does pretraining lead to high accuracy?
- Well, dear, transfer selects a good loss basin that contains all finetuning runs.
- But mama—why does OOD accuracy vary so much between models? 🧵
w
@JunejaJeevesh
@deaddarkmatter
@kchonyc
@JoaoSedoc
I have a CS bachelor's, CS MEng, CS PhD---but also a Language Technologies minor from undergrad and machine learning researchers call me a "linguist" on a regular basis.
I've been explaining how, for Computer Scientist women, picking up more skills means the others are erased & you're further demeaned. I started programming at ~10. Trained my first model when I was ~21 (without ML packages). Worked in "AI" for ~15 years. I have a doctorate in CS.
Personal news! Exciting news! In winter 2021, I will be starting a postdoc with
@kchonyc
(et. al.) at ML^2/
@CILVRatNYU
.
NYC, catch you in the FUTURE (pictured below).
If you get a rejection for a one page abstract only complaining about missing explanatory details, you know what it means? That every abstract accepted was about applying a known architecture to a previously defined problem. You can't explain weird new stuff in a page!
Reviewer 2 has completely misunderstood my work. I wasn’t making that claim; I was just vaguely implying it so my paper would accumulate hype but I wouldn’t have to defend it.
I have been talking constantly about this paper for months. FINALLY out! We found such a powerful example of emergent capabilities, and we can actually interpret the dependencies involved in the breakthrough. It upends simple stories about scaling, inductive bias, and complexity.
New work w/
@ziv_ravid
@kchonyc
@leavittron
@nsaphra
: We break the steepest MLM loss drop into *2* phase changes: first in internal grammatical structure, then external capabilities. Big implications for emergence, simplicity bias, and interpretability! 🧵
Can’t believe after so long living in Edinburgh I’m on a plane to move to NYC with everything I own. It’s been 5 years of eating veggie haggis and making up Jewish holidays whenever I wanted to take a day off, but nobody is going to fall for it now.
This article on the Protactile deafblind language community has made me think in completely different ways about language, disability, and the inadequacy of “accessibility” as a framework for enabling disabled culture.
"Understanding Generalization through Visualizations"
Contains a lovely figure illustrating how SGD is on a magical but perilous journey through a terrifying field of spiky memorized optima, averting each on its quest for high generalization margins
The thing about climate science is that their doomsday forecasting is based on actual physics models and simulations instead of vigorous hand waving, rectally derived priors, and thinly veiled eschatological arguments.
I'm now on the academic job market! I work on understanding and improving training for NLP models, with a focus on studying how structures and mechanistic behaviors emerge over the course of training.
Please reach out if you think I might be a good fit!
I ruin your day with a tale in which the Internet suddenly notices that big scientific projects are the work of big teams and not just of the first author -- but only after a woman happens to be first author.
What's it like being a woman in tech?
You're an MIT grad who worked on creating the first pictures ever taken of a black hole and a bunch of tech bros scour your Github check-ins to prove you wrote less code than the men on the project.
Ugh. 🤦🏾♂️
René Carmille, an early ethical hacker. He was killed for hacking his own punch card machine to sabotage religious classification during the Nazi census in France, saving tens of thousands of Jews.
everybody: I'm going to write a blog post about transformer models that illustrates everything with beautiful pictures, to provide intuition!
nobody: I'm going to write down the actual mathematical operations all in one place so people can just do basic algebra with it
Reading practical machine learning research from the early 2000s is such a trip. Like, they claim every decision is built on some theoretical result in kernel methods or PGMs. Then you return to 2022 papers and every decision just
Apparently controversial opinion: regulation should be based on public demands, not on what science is currently capable of. If you can't provide explainability, but explainability is essential for public safety, then you wait to deploy until after you have explainability.
New work w/
@ziv_ravid
@kchonyc
@leavittron
@nsaphra
: We break the steepest MLM loss drop into *2* phase changes: first in internal grammatical structure, then external capabilities. Big implications for emergence, simplicity bias, and interpretability! 🧵
Ok so DeepMind (an Alphabet/Google property) has released an AI alignment paper on language models. Compare to Stochastic Parrots to understand what BigTechCo wants the "ethics" conversation in NLP to look like---and what it doesn't.
Language modeling seems like an artificial task to outsiders, but sequential word prediction is pretty essential to how humans understand language as well. You'll still get confused if I say a banana you weren't expecting.
The open source perf gap has consistently remained on the data side, and academia doesn’t have the right incentives to foster data specialists. We need paid nonprofit data curators and cleaners, because data work is unglamorous and won’t get you tenure.
Today Google announced PaLM 2. In their 91 page paper they repeatedly say the training data is key ("we find that the data mixture is a critical component of the final model") while providing almost no information about how it was constructed, how it was sourced, or its contents.
When I was at
@recursecenter
, we had a pair programming event called Refucktoring, with the prompt "make this code terrible, while still passing all its tests". I am forever proud of how I destroyed a python implementation of Conway's Game of Life.
People really don't believe me that "mechanistic interpretability" is old hat in NLP, but I have yet to see neuron-level explanation work in the last two years that improves on this 2019 paper about LSTMs from
@lakretz
et al.
I have been checking daily for a recording of
@prfsanjeevarora
's extremely interesting ICML workshop talk, "Is Optimization a Sufficient Language for Understanding Deep Learning?" Good news:
This is such an ascientific way of talking about the ability of a system to generate incorrect factual statements when the system itself exhibits no evidence of a consistent underlying world model. Do not fall for it.
LLMs can lie. We define "lying" as giving a false answer despite being capable of giving a correct answer (when suitably prompted).
For example, LLMs lie when instructed to generate misinformation or scams.
Can lie detectors help?
When conferences extended their deadlines for BLM protests in 2020, one response was "this is America-centric!".
1. It *was* America-centric.
2. It can also be the beginning of a more *globally* humane approach to managing deadlines, by responding to India's outbreak.
#RepL4NLP
#ACL2020
reviews are coming up. We talk a lot about how awful bully reviewers are, but it's hard to admit that you've been That Guy. I have! It's easy to write a mean review! Make sure:
- You've eaten
- You've slept
- You're not mad
- Pick a time your jobs are running smoothly
I'm pretty excited about this new thread of research: using principled, interpretable classical machine learning models (HMMs) to understand modern mysterious machine learning methods (neural nets). Turns out you can identify generalization time just from metrics on the weights!
Have you ever wondered why some random seeds outperform others?
To understand the role of randomness, our new paper uses classical machine learning to analyze training in deep learning.
🧵👇
Perl is good, actually. If you just learn basic Perl, the one-liners are better for text processing than learning a million shell tools or making a whole python script file each time.
I'm still prepping the camera-ready for my
@naacl
paper, but if people take away one thing, I want it to be that they should be specific in what they mean when they say a representation "encodes" some linguistic property, and to recognize the drawbacks of their definition.
Exciting news: I'm going to be interning this winter in NYC with
@dipanjand
and
@iftenney
Dec-Feb! Do any New Yorkers want to:
- Meet me?
- Invite me to do something cool?
- Find me an apartment?
New paper alert! When you use a linearizing attribution like Shapley, the residual nonlinear interaction between features reflects the underlying structure of the data described by the sciences of syntax, phonology, and vision.
@sama
A jackalope with duck feet is standing next to a yeti who is brushing the jackalope's teeth. The jackalope is crying because he wishes he were as tall as the yeti. They are both standing upside down because they are in australia.
@StenderWorld
I sympathize, but I was convinced against this because lots of people in the US can't leave as a result of the ban -- since their reentry is blocked. There's no solution, and every country excludes some nationalities or is prohibitively expensive to enter.
When my paper gets an enthusiastic reception, we have hope that a lone ML PhD with limited GPU resources can still develop a strange idea, throw it into the world, and get it shared widely, as long as they have 3000+ strangers in a weird online parasocial relationship.
Working with small corpora is weird. You get to know them uncomfortably well. There's an entire generation of NLP researchers who know the name of Pierre Vinken (the subject of the first sentence in the WSJ) and which Enron employee coordinated his extramarital affair over email.
New manifesto! How can AI scientists borrow from evolutionary biology to strengthen their evidence and claims? Featuring the beautiful eggs of the tawny-flanked prinia.
As soon as you take one undergraduate linguistics class, you are no longer a reliable informant for linguistic acceptability. You speak zero languages. You have forgotten your mother tongue, trading all that makes you human for nonsense words like "agglutinative".
A second year PhD student should not have to publicly defend herself against baseless accusations of misconduct in a previous version of a paper. Nonetheless, all authors should understand that an excited award committee is more significant than dubious nitpicking on twitter.
I realize this is seemingly an unpopular opinion, but I can't get onboard with these Twitter criticisms of some of the recent
#ICML2022
best paper awardees. I've been thinking about this all day. A thread... 🧵 1/N
Techies: you're used to your skills being valued, you want to help. But your skills are not your most valuable asset right now. Your money is.
Donate to your bail fund.
This week I’m feeling validated about getting a D on my undergrad intro philosophy paper on whether machines can be conscious, because the grader didn’t like my argument that the terms were ill-defined and the question wasn’t meaningful
How is EMNLP making the wrong call on this right now, when the COVID situation in India is not letting up and so many academics have friends and family dying there? Rather than alleviate inequality, extensions that only respond to US crises increase it.
Wrote to the EMNLP 2021 PC Chairs considering extending the deadline due to COVID-19 "catastrophic" situation in India, but was turned down. I recall in EMNLP 2020 the deadline was extended due to protests in the US, which caused some controversies. Submitting to CIKM 2021 now.
The most underrated skill a new researcher acquires is the ability to remember paper authors. Baby PhD students are always so impressed when I, ancient and wizened, remember whether a paper was by
@jacobeisenstein
or
@adveisner
.
A few notes on this paper:
- The first two authors are both *undergraduates* in Delhi! Keep an eye out for them next grad application season.
- We found each other through
@ml_collective
- It's my first time last-authoring! wow so senior
- Mama, how does pretraining lead to high accuracy?
- Well, dear, transfer selects a good loss basin that contains all finetuning runs.
- But mama—why does OOD accuracy vary so much between models? 🧵
w
@JunejaJeevesh
@deaddarkmatter
@kchonyc
@JoaoSedoc
I wrote a brief lit review on modifying hyperparameters during DNN training. There's gotta be someone else is interested in adaptive architectures and prior-inducing training, so I put it on my blog.
Within a couple of years, someone is going to go viral on machine learning youtube for getting an autogenerated spam paper into a major conference. ML researchers need to understand that hype builds an ecosystem around us that is not about research progress made in good faith.
This is really sad. And it's interesting to me that academics think the same fate isn't awaiting their conferences or journals if this isn't put under control...
After “the UK has the best food in Europe”, my most inflammatory opinion is definitely “Canada is the single worst country to host a conference.” Stop treating it like a woke version of the US. They reject visas at a much higher rate.
I suspect GPT-4's performance is influenced by data contamination, at least on Codeforces.
Of the easiest problems on Codeforces, it solved 10/10 pre-2021 problems and 0/10 recent problems.
This strongly points to contamination.
1/4
Really excited this work is public, because I keep wanting to point people to it. It took me straight from "who cares if deep nets aren't 'calibrated', they aren't even supposed to be probabilistic" over to "calibration is fundamental to all generalization capability".
New ICML workshop paper 🚨. Are deep neural nets calibrated? The literature is conflicted... bc the question itself has changed over time: as architectures, optimizers, and datasets evolve, it is difficult to disentangle factors which affect calibration.
I have seen Chinese students tolerate incredibly abusive treatment by advisors because their visas depend on it. Screw any "immigration policy" that can't help highly-educated indentured servants.
This happened. A PhD student took his own life because of alleged pressure to commit and perpetuate academic fraud. After reading this article and the notes/text messages in both languages, I am floored and incredibly sad. This is the worst of academia, and we cannot let it stand
My reaction to papers definitely changed during my PhD.
1 "What an interesting discovery!"
2 "What an interesting result!"
3 "What an interesting result, if supported by derivative work/reproduction/application!"
4 "What an interesting thought experiment!" *skip results section*
There are fish that exhibit complex communication abilities, describing predator size. That recognize regular parasite-cleaning customers and treat them better than reef visitors. That use tools and hunt with eel partners. FISH. And you think language makes a person? Small mind.
@recursecenter
We replaced the whole program with a function that retrieves the test script source code by inspecting the parent callstack, and returns the first str type in that source.
Every time a test function checked a return value by comparing to a fixed string -- we return that string.