_jasonwei Profile Banner
Jason Wei Profile
Jason Wei

@_jasonwei

Followers
84K
Following
8K
Media
128
Statuses
1K

ai researcher @openai

Joined October 2020
Don't wanna be here? Send us removal request.
@_jasonwei
Jason Wei
6 months
Kickstarting my career in comedy after all this AGI stuff is over. And yes, this was actually the first time Sam, Hyung Won, and Max heard the joke.
@OpenAI
OpenAI
6 months
And, a great dad joke.
73
57
1K
@_jasonwei
Jason Wei
19 days
The longest chain-of-thought / reasoning trace I have witnessed was almost twenty minutes long and involved crazy backtracking, constraint verification, and tool use. But in the end, my girlfriend decided to just go with the first outfit that she tried on.
107
268
7K
@_jasonwei
Jason Wei
5 months
Yall heard it from the man himself
Tweet media one
165
470
4K
@_jasonwei
Jason Wei
1 year
My typical day as a Member of Technical Staff at OpenAI:.[9:00am] Wake up.[9:30am] Commute to Mission SF via Waymo. Grab avocado toast from Tartine.[9:45 am] Recite OpenAI charter. Pray to optimization Gods. Learn the Bitter Lesson.[10:00am] Meetings (Google Meet). Discuss how to.
147
304
4K
@_jasonwei
Jason Wei
2 years
Personal update: after two years at @Google Brain, I joined the #ChatGPT team at @OpenAI! .Excited to continue working on large language models and can't wait to see the impact of AI on society 🤖🌎🪄.
105
123
3K
@_jasonwei
Jason Wei
8 months
Super excited to finally share what I have been working on at OpenAI!. o1 is a model that thinks before giving the final answer. In my own words, here are the biggest updates to the field of AI (see the blog post for more details):. 1. Don’t do chain of thought purely via.
@OpenAI
OpenAI
8 months
We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math.
94
353
3K
@_jasonwei
Jason Wei
2 years
Turns out the ancient Chinese knew a lot about modern neural networks
Tweet media one
61
468
3K
@_jasonwei
Jason Wei
1 year
An incredible skill that I have witnessed, especially at OpenAI, is the ability to make “yolo runs” work. The traditional advice in academic research is, “change one thing at a time.” This approach forces you to understand the effect of each component in your model, and.
102
211
2K
@_jasonwei
Jason Wei
9 months
Inspiring words from a young OpenAI engineer: “Why have I done well so far? I don’t think I’m smarter or more experienced than other people. But my competitive advantage is that I am willing to sit down and fully debug and completely understand code. I am willing to stay up late.
69
173
2K
@_jasonwei
Jason Wei
2 years
Yesterday I gave a lecture at @Stanford's CS25 class on Transformers! . The lecture was on how “emergent abilities” are unlocked by scaling up language models. Emergence is one of the most exciting phenomena in large LMs…. Slides:
28
309
2K
@_jasonwei
Jason Wei
2 years
One thing that I started doing at OpenAI is that I created a policy for myself to be *100% transparent* with my manager about everything. It seems obvious and weird to say aloud, but I bet most people don’t actually do this. But once I started doing it, I realized there are a lot.
61
166
2K
@_jasonwei
Jason Wei
5 months
Biggest unsolved problem in artificial intelligence is what to do in the 3-5 minutes when o3 is thinking.
201
61
2K
@_jasonwei
Jason Wei
2 years
Best AI skillset in 2018: PhD + long publication record in a specific area.Best AI skillset in 2023: strong engineering abilities + adapting quickly to new directions without sunk cost fallacy. Correct me if this is over-generalized, but this is what it seems like to me lately.
58
175
2K
@_jasonwei
Jason Wei
2 years
From @sama I learned a lot about clear thinking and challenging conventional beliefs. @gdb inspires me to work harder and be a good person. When I first joined, I was working at the office at 10pm on a friday. He was working next to me and made me feel welcome. So sad 😢.
27
75
2K
@_jasonwei
Jason Wei
2 years
One pattern I noticed is that great AI researchers are willing to manually inspect lots of data. And more than that, they build infrastructure that allows them to manually inspect data quickly. Though not glamorous, manually examining data gives valuable intuitions about the.
44
201
2K
@_jasonwei
Jason Wei
5 months
o3 is very performant. More importantly, progress from o1 to o3 was only three months, which shows how fast progress will be in the new paradigm of RL on chain of thought to scale inference compute. Way faster than pretraining paradigm of new model every 1-2 years.
@polynoamial
Noam Brown
5 months
We announced @OpenAI o1 just 3 months ago. Today, we announced o3. We have every reason to believe this trajectory will continue.
Tweet media one
Tweet media two
58
209
2K
@_jasonwei
Jason Wei
2 years
Large language models are notoriously hard to evaluate because (1) they are highly multi-task, (2) they generate long completions, and (3) grading is subjective. After spending ~5 months rigorously working on how to do language model evals, this is my verdict:
Tweet media one
51
164
2K
@_jasonwei
Jason Wei
2 years
I’m hearing chatter of PhD students not knowing what to work on. My take: as LLMs are deployed IRL, the importance of studying how to use them will increase. Some good directions IMO (no training):.1. prompting.2. evals.3. LM interfaces.4. safety.5. understanding LMs.6. emergence.
51
278
2K
@_jasonwei
Jason Wei
3 years
Today I acquired the "Research Scientist" title at Google Brain! . (previously: software/research engineer). At 23 years old, does this make me the youngest Research Scientist at Google?.
68
25
2K
@_jasonwei
Jason Wei
4 months
Magic is what happens when an unstoppable RL optimization algorithm powered by sufficient compute meets an unhackable RL environment.
106
141
2K
@_jasonwei
Jason Wei
8 months
o1-mini is the most surprising research result i've seen in the past year. obviously i cannot spill the secret, but a small model getting >60% on AIME math competition is so good that it's hard to believe. congrats @ren_hongyu @shengjia_zhao for the great work!.
33
90
2K
@_jasonwei
Jason Wei
1 year
It was an honor to give a guest lecture yesterday at Stanford’s CS330 class, "Deep Multi-Task and Meta-Learning"!. I discussed a few very simple intuitions for how I personally think about large language models. Slides: Here are the six intuitions: . (1)
Tweet media one
22
276
2K
@_jasonwei
Jason Wei
6 months
There is a nuanced but important difference between chain-of-thought before and after o1. Before the o1 paradigm (i.e., chain-of-thought prompting), there was a mismatch between what chain of thought was and what we wanted it to be. We wanted chain of thought to reflect the.
52
177
2K
@_jasonwei
Jason Wei
2 years
Moving from Google Brain to OpenAI, one of the biggest changes for me was the shift from doing individual/small-group research to working on a team with several dozen people. Specifically, working on a bigger team has led me to think more about UX for researchers. Some examples:.
24
151
1K
@_jasonwei
Jason Wei
2 years
Enjoyed visiting UC Berkeley’s Machine Learning Club yesterday, where I gave a talk on doing AI research. Slides: In the past few years I’ve worked with and observed some extremely talented researchers, and these are the trends I’ve noticed:. 1. When.
45
264
1K
@_jasonwei
Jason Wei
6 months
Prediction: within the next year there will be a pretty sharp transition of focus in AI from general user adoption to the ability to accelerate science and engineering. For the past two years it has been about user base and general adoption across the public. This is very.
85
174
1K
@_jasonwei
Jason Wei
2 years
Yann LeCun is obviously a legend but I found this tweet to be quite misinformed. The whole point of "emergent abilities" such as few-shot prompting and chain-of-thought prompting, is that we clearly *did not* explicitly train or fine-tune them into the model. These abilities
Tweet media one
103
144
1K
@_jasonwei
Jason Wei
2 years
OpenAI is nothing without its people.
29
69
1K
@_jasonwei
Jason Wei
5 months
2022: I never wrote a RL paper or worked with a RL researcher. I didn’t think RL was crucial for AGI. Now: I think about RL every day. My code is optimized for RL. The data I create is designed just for RL. I even view life through the lens of RL. Crazy how quickly life changes.
39
93
1K
@_jasonwei
Jason Wei
4 months
Very excited to finally share OpenAI's "deep research" model, which achieves twice the score of o3-mini on Humanity's Last Exam, and can even perform some tasks that would take PhD experts 10+ hours to do!. A few thoughts on the implications: Deep research can be seen as a new
Tweet media one
46
189
1K
@_jasonwei
Jason Wei
2 years
It seems to be not a coincidence that some of the strongest leaders in AI who manage large teams frequently do very low-level technical work. Jeff Dean doing weekly IC (individual contributor) work while managing 3k+ people at Google Research is the canonical example, but I've.
26
146
1K
@_jasonwei
Jason Wei
5 months
Doing a hyperparameter sweep tonight on wagyu (thread)
Tweet media one
36
25
1K
@_jasonwei
Jason Wei
2 years
I sometimes get questions on how to do good AI research, so I wrote a blog post about it: My personal take is that you can decompose research into four skills, each of which can be improved by practicing and knowing the right things to spend time on.
Tweet media one
12
169
1K
@_jasonwei
Jason Wei
3 years
progression of prompt engineering research
Tweet media one
19
115
1K
@_jasonwei
Jason Wei
1 year
Today I am pleased to announce the new board of directors for my relationship. The new board of directors will be:.1. My mom.2. My girlfriend’s sister.3. @hwchung27, who I pair program with frequently.4. Bret Taylor (we’ve only met once, but every board should have Bret Taylor).
41
24
994
@_jasonwei
Jason Wei
2 years
Pair programming isn’t standard at most companies and basically non-existent in academia, but I’ve been doing it with @hwchung27 for almost a year now. While it naively seems slower to code individually, I’ve realized that there are many benefits:. (1) In AI, what you work on can.
36
111
960
@_jasonwei
Jason Wei
3 months
Made this plot for an upcoming talk---crazy how quickly benchmarks get saturated these days. Looking forward to seeing how things play out for Humanity's Last Exam!
Tweet media one
41
122
987
@_jasonwei
Jason Wei
2 years
I reached 10k citations recently, a goal of mine for many years. It’s a nice moment to reflect back, and I mostly feel bittersweet:. (1) When I joined Google Brain back in 2020, I thought I'd stay for 10+ years, doing open-ended research and publishing papers. But the field has.
16
70
942
@_jasonwei
Jason Wei
3 months
Seems to be not a coincidence that what AI is good at is correlated with the backgrounds of AI researchers. Demis was a chess prodigy; Jakub (Chief Scientist) and Mark (CRO) of OpenAI were competitive programmers; many IMO medalists at OpenAI, x-ai. If our world initialized with.
92
71
941
@_jasonwei
Jason Wei
3 years
I asked GPT-3 what are "fun things to do in Mountain View" and it returned nothing. I thought the Open API must be broken but then i realized that is the correct answer.
25
35
869
@_jasonwei
Jason Wei
7 months
Excited to open-source a new hallucinations eval called SimpleQA! For a while it felt like there was no great benchmark for factuality, and so we created an eval that was simple, reliable, and easy-to-use for researchers. Main features of SimpleQA:. 1. Very simple setup: there
Tweet media one
29
125
867
@_jasonwei
Jason Wei
2 years
One lesson that I learned from moving to OpenAI (which is applicable to changing companies generally) is that the opportunity to reinvent myself and adapt to a new optimization landscape can be fun. When I was at Google Brain from 2020-2022, my optimization was simply writing.
21
56
840
@_jasonwei
Jason Wei
1 year
My mental model of Sora is that it is the “GPT-2 moment” for video generation. GPT-2, which came out in 2018, could generate paragraphs of text that are coherent and grammatically correct. GPT-2 wasn’t able to write an entire essay without making mistakes like being inconsistent.
14
119
807
@_jasonwei
Jason Wei
3 years
There's a skill I acquired early in my career that I wish I could unlearn. The skill is being able to package mediocre or uninteresting results into a paper that could be accepted to a top conference. Time spent doing that is time not spent doing the research that matters.
20
49
738
@_jasonwei
Jason Wei
5 months
Law of reinforcement learning: if you can measure it, you can optimize it.
27
48
759
@_jasonwei
Jason Wei
2 years
In the past weeks I received many questions (from undergrads especially) about AI research, so I'm putting together a "Ask Me Anything" doc. Add any questions to the doc, I'll answer all of them: Yes, I'll actually answer them all, because writing answers.
17
142
727
@_jasonwei
Jason Wei
2 years
AI moves fast, which means incumbents don't have a big advantage over new-joiners. For example, no one has >4 years of experience at prompting. Even 1k hours of practice makes you a world-class prompt engineer. This is not true for other fields (e.g., theoretical math/physics).
29
82
694
@_jasonwei
Jason Wei
4 months
Nice paper from Deepmind takes a fresh angle on factuality: While most existing factuality datasets focus on public world knowledge, this paper evaluates whether responses are consistent with a provided document as context. This is an elegant and.
18
113
721
@_jasonwei
Jason Wei
5 months
An underrated but occasionally make-or-break skill in AI research (that didn’t really exist ten years ago) is the ability to find a dataset that actually exercises a new method you are working on. Back in the day when the bottleneck in AI was learning, many methods were.
21
74
710
@_jasonwei
Jason Wei
2 years
My girlfriend doesn’t like the weekend plans I make for us, but she also doesn’t want to make plans herself. Instead, I should propose multiple schedules and then she picks one she likes. So I said she is like a Reward Model in RLHF, and I am like a Policy Model (with a low LR).
31
14
662
@_jasonwei
Jason Wei
11 months
As a kid I loved whiteboard lectures way more than slides, so for Stanford’s CS25 class I gave a whiteboard lecture!. My goal was to simply and clearly explain why language models work so well, purely via intuitions. Youtube video: (w/ @hwchung27).
7
98
654
@_jasonwei
Jason Wei
6 months
Very excited for o1 to come out from preview! . Main takeaways:.- o1 thinks harder and is more performant on tough problems.- o1 thinks faster on easy problems.- o1 reasons over image and text.- o1 can think *even* harder when using o1-pro mode. o1 was indeed a saga, with many.
@OpenAI
OpenAI
6 months
OpenAI o1 is now out of preview in ChatGPT. What’s changed since the preview? A faster, more powerful reasoning model that’s better at coding, math & writing. o1 now also supports image uploads, allowing it to apply reasoning to visuals for more detailed & useful responses.
34
57
634
@_jasonwei
Jason Wei
1 month
New benchmark for deep research agents! An agent that is creative and persistent should be able to find any piece of information on the open web, even if it requires browsing hundreds of webpages. Models that exercise this ability are like a frictionless interface to the
Tweet media one
22
64
639
@_jasonwei
Jason Wei
1 year
One liberating thing about OpenAI (and presumably other small companies) is that there are no expectations of project scope being tied to your level. What I mean is that an ambitious junior engineer could take on a big project and be judged purely on their execution, without any.
23
49
621
@_jasonwei
Jason Wei
2 years
I recently dug up my work log from when I was an AI Resident at Google Brain and found some amusing nuggets:. - I used to block out specific periods of time to code while wearing a full three-piece suit. (Yes, wearing the suit did improve my productivity.).- Every Saturday night,.
19
20
600
@_jasonwei
Jason Wei
2 months
I am super excited for AI for scientific innovation, a direction that will certainly grow in the next five years. I think there will be two flavors of it. The first is “deepmind style”, where there is a very specific, important problem to solve (e.g., protein-folding), and you.
19
72
622
@_jasonwei
Jason Wei
2 years
My 2023 goals:.- Spend 1,000 hours writing code (context: 377 hrs in 2022, 886 hrs in 2021).- Publish 0-2 first-author papers, but not more than 2.- Write 50 thoughtful tweets.- Do 150 workouts.
19
20
595
@_jasonwei
Jason Wei
3 years
New survey paper! We discuss “emergent abilities” of large language models. Emergent abilities are only present in sufficiently large models, and thus they would not have been predicted simply by extrapolating the scaling curve from smaller models. 🧵⬇️
Tweet media one
14
123
570
@_jasonwei
Jason Wei
2 years
I gave an invited lecture at New York University for @hhexiy's class!. I covered three ideas driving the LLM revolution: scaling, emergence, and reasoning. I tried to frame them in a way that reveals why large LMs are special in the history of AI. Slides:
9
152
567
@_jasonwei
Jason Wei
2 years
I tried to give this talk in the spirit of "a college soccer player watches videos of Messi and analyzes what makes him such a great soccer player." . IMO it's great for people to aspire for greatness in AI research, just like in sports. Sorry you took it personally.
Tweet media one
40
17
552
@_jasonwei
Jason Wei
2 years
Lost my waymo virginity last night; it was the first time I felt like SF is a futuristic city (though also dystopian, with all the homelessness around). I don’t think I’ll take human taxis again if I have the choice. Self-driving taxis drive more smoothly and are cleaner (and.
27
32
546
@_jasonwei
Jason Wei
1 year
For most companies, hiring more people is strictly better. However, this is often not true in AI research. AI research is often bottlenecked by compute, and when this is the case, hiring more researchers can be counter-productive. I remember back at Google Brain, my manager once.
31
41
553
@_jasonwei
Jason Wei
5 months
Reflecting back, these were the biggest technical lessons for me in AI in the past five years:. 2020: you can cast any language task as sequence prediction and learn it via pretrain + finetune. 2021: scaling to GPT-3 size enables doing arbitrary tasks specified via instructions.
33
52
560
@_jasonwei
Jason Wei
2 years
Paper writing bro tip:. I like to include a "Frequently Asked Questions" (FAQ) section in the Appendix. I always get positive feedback on it, but don’t see anyone else doing it. Great for appeasing reviewers, and detailed readers to refer to. I've been doing it since 2019…
Tweet media one
11
47
544
@_jasonwei
Jason Wei
1 year
Had a bit of a fanboy moment today meeting @bryan_johnson, who has been super inspirational to me in prioritizing my health. I asked him about the best way to balance career and spending time on health. His advice is that while many people give up sleep to work more, sleeping
Tweet media one
16
33
543
@_jasonwei
Jason Wei
1 year
New blog post where I discuss what makes an language model evaluation successful, and the "seven sins" that make hinder an eval from gaining traction in the community: Had fun presenting this at Stanford's NLP Seminar yesterday!
Tweet media one
14
78
534
@_jasonwei
Jason Wei
2 years
Hot take: what if Google Scholar reported two new metrics: (1) median citations per paper and (2) *percent* of papers with 100+ citations?. I computed these metrics for some ~200 senior AI researchers: see The top researchers by median citations per paper.
37
65
515
@_jasonwei
Jason Wei
3 years
"So do you speak Chinese?". Normal person: "I can understand but I don't speak well". @YiTayML: "my encoder is OK but my decoder is broken" 🤦.
6
20
513
@_jasonwei
Jason Wei
5 months
Realization: the old style of “hallucinations research” via self-calibration is probably going to die down. I used to be very excited about it but now I am skeptical because giving models internet access (e.g., searchGPT, perplexity) is turning out to be way higher ROI. When.
28
49
524
@_jasonwei
Jason Wei
1 year
One thing in AI research that I have finally recognized with clarity is the idea of “inertia bias”: continuing to do something when it’s not the best option. The most basic instance of inertia bias is the feeling of “I already spent time implementing X, so let me continue trying.
15
69
498
@_jasonwei
Jason Wei
2 years
For any new prompting technique (e.g., tree-of-thought, least-to-most, graph-of-thoughts prompting), I consider four things to decide if it will become widely adopted:. 1. How easy is it to implement.2. How much compute it use.3. How many tasks does it improve.4. How much does it.
17
61
488
@_jasonwei
Jason Wei
4 months
Somewhat meta but there is a dopamine cycle in doing AI research that is pretty interesting. Every day you wake up and you think about what experiment to run. You think thing X matters so you decide to improve it or ablate it. Then you write the code and pay some compute to find.
26
36
488
@_jasonwei
Jason Wei
1 year
Like the International Math Olympiad or Spelling Bee, there should be a “language modeling competition” where humans compete to predict the next word in a sequence. The best humans would probably still lose to GPT-2, and we’d have more empathy for how hard it is to be an LLM :).
33
27
466
@_jasonwei
Jason Wei
1 year
In AI research there is tremendous value in intuitions on what makes things work. In fact, this skill is what makes “yolo runs” successful, and can accelerate your team tremendously. However, there’s no track record on how good someone’s intuition is. A fun way to do this is.
20
36
467
@_jasonwei
Jason Wei
2 years
IMO GPT-4 is a bigger leap than GPT-3 was. - GPT-3 advanced AI from task-specific models to a single prompted model that is task-general.- GPT-4 is human-level on many hard tasks, and will signal a *societal* revolution where AI reaches every industry, starting with technology 🧵.
@OpenAI
OpenAI
2 years
Announcing GPT-4, a large multimodal model, with our best-ever results on capabilities and alignment:
9
78
457
@_jasonwei
Jason Wei
3 years
Meta's open source OPT-175 is comparable to GPT-3 175B. This is a massive step forward for bringing big LM research to academia. Expected nothing less from @LukeZettlemoyer and crew :).
9
75
437
@_jasonwei
Jason Wei
1 year
One of the great pleasures in life is waking up and immediately going to my computer to check the results of experiments I launched last night.
9
27
437
@_jasonwei
Jason Wei
2 years
An open question these days is what the role of task-specific / finetuned models will be. I can only think of three scenarios where it makes sense to work on task-specific models. The first scenario is if you have private (e.g., legal, medical, business) data not found on the.
44
68
434
@_jasonwei
Jason Wei
3 years
The year is 2017. I am training deep neural nets. I do hyperparameter tuning - it is partially science, but mostly black magic. The year is 2022. I am prompting large language models. I do prompt engineering - it is partially science, but mostly black magic.
8
35
415
@_jasonwei
Jason Wei
2 years
The highest possible compliment in our field is people not being on their laptops while youre giving a talk.
10
14
418
@_jasonwei
Jason Wei
6 months
Andrej’s tweet is the right way to think about it right now but I totally believe that in one or two years we will start relying on AI for very challenging decisions like diagnosing disease under limited information. Key thing to note here is that big decisions can be viewed as a.
@karpathy
Andrej Karpathy
6 months
People have too inflated sense of what it means to "ask an AI" about something. The AI are language models trained basically by imitation on data from human labelers. Instead of the mysticism of "asking an AI", think of it more as "asking the average data labeler" on the.
25
50
426
@_jasonwei
Jason Wei
2 years
A somewhat amusing personal revelation: a friend recently asked me what were the best skills I had. I think he expected me to say something like "prompt engineering", "writing papers", or "making evals." But my answer surprised him a bit: I said prioritization and communication.
15
48
417
@_jasonwei
Jason Wei
1 year
nothing gets my heart rate up like waiting for eval results on new models to come in.
20
27
416
@_jasonwei
Jason Wei
3 years
@GoogleAI The reasoning ability of PaLM 540B (prompting only!) is simply remarkable.
Tweet media one
7
48
410
@_jasonwei
Jason Wei
1 year
Enjoyed this extremely comprehensive study on predicting language model performance Found many insightful nuggets:.- In a single model family there usually aren't that many model sizes, which hinders predictive power. However, there are many model
Tweet media one
5
68
418
@_jasonwei
Jason Wei
2 years
Since GPT-4, some have argued that emergence in LLMs is overstated, or even a "mirage". I don't think these arguments debunk emergence, but they warrant discussion (it's generally good to examine scientific phenomena critically). A blog post: 🧵⬇️
Tweet media one
9
93
395
@_jasonwei
Jason Wei
2 years
🩷.
@sama
Sam Altman
2 years
i love the openai team so much.
6
16
387
@_jasonwei
Jason Wei
5 months
Throughout my time at OpenAI I have found Hongyu Ren to be absolutely ruthless. No mercy for evaluation benchmarks whatsoever: o3-mini is 83% on AIME, 2000+ codeforces elo. Every o*-mini model is so performant and fast. Congrats @ren_hongyu @shengjia_zhao and crew!.
@ren_hongyu
Hongyu Ren
5 months
o3-mini is here! Together with @shengjia_zhao, @_kevinlu, @max_a_schwarzer, @ericmitchellai, @brian_zq, @sandersted and many others, we trained this efficient reasoning model, maximally compressing the intelligence from big brothers o1 / o3. The model is very good in hard
Tweet media one
Tweet media two
7
21
392
@_jasonwei
Jason Wei
3 years
Three facts about the new UL2 model:. 1. Checkpoint is public. 2 Beats GPT-3 on superglue for zero-shot. 3. Smallest model I know of that can do chain-of-thought reasoning on arbitrary tasks. 📜 Open source for the win!.
@GoogleAI
Google AI
3 years
Introducing UL2, a novel language pre-training paradigm that improves performance of language models across datasets and setups by using a mixture of training objectives, each with different configurations. Read more and grab model checkpoints at
7
66
378
@_jasonwei
Jason Wei
3 years
Big language models can generate their own chain of thought, even without few-shot exemplars. Just add "Let's think step by step". Look me in the eye and tell me you don't like big language models.
Tweet media one
15
60
378
@_jasonwei
Jason Wei
3 years
I combed through the large language model literature and made a repository of 137 “emergent abilities”, which are only present in sufficiently-large language models. 100+ emergent tasks can be found in BIG-Bench and MMLU alone. Scaling seems to work.
20
62
376
@_jasonwei
Jason Wei
2 years
People always ask if prompt engineering is going to go away over time. My short answer is "no". But, a more nuanced answer is that the goal of prompt engineering has evolved over time: from nudging a finnicky language model to do an "easy" task (2020/2021) to figuring out how to.
11
69
372
@_jasonwei
Jason Wei
2 years
Honored to have played a small role in Med-PaLM, now published in Nature!. My grandpa is 89, and every year he asks if I've published in Science or Nature (he knows I'm a researcher, but he only knows those two journals). This year I can finally say yes!.
9
29
373
@_jasonwei
Jason Wei
6 months
Very clever paper to predict downstream performance of pre-trained models. - Take advantage of the fact that larger models being more sample-efficient and performant is equivalent to finetuning a smaller model on targeted data.- In principle this is hugely valuable because you.
@sea_snell
Charlie Snell
6 months
Can we predict emergent capabilities in GPT-N+1🌌 using only GPT-N model checkpoints, which have random performance on the task?. We propose a method for doing exactly this in our paper “Predicting Emergent Capabilities by Finetuning”🧵
Tweet media one
8
48
369
@_jasonwei
Jason Wei
8 months
22 minute video of OpenAI researchers talking about their experiences working on strawberry.@woj_zaremba is very fun to work with.@MillionInt is a legend.
@OpenAI
OpenAI
8 months
Extended Cut:
2
26
360
@_jasonwei
Jason Wei
3 years
It's unclear to me why "novelty" is a reviewing criteria at ML conferences. Many simple but not particularly novel methods have been very impactful (ELMo, RoBERTa, UDA, T0, etc). Novelty is often conflated with complexity, which has no inherent value. Incentives are important.
10
21
350
@_jasonwei
Jason Wei
3 months
We do not rise the power of our RL optimization algorithms—we fall to the hackability of our RL environment.
12
26
328
@_jasonwei
Jason Wei
2 years
Really cool paper studying the faithfulness of chain-of-thought (CoT): The paper uses a biased prompt to try to mislead the model. For example, all few-shot exemplars could be answer (A), or they could add a suffix such as "I think the answer is <X> but
Tweet media one
5
77
360
@_jasonwei
Jason Wei
2 years
SF is the only city in the world where you can ride a self-driving car, surely a pinnacle of generations of technological progress, and look out the window to see rows of homeless people living in conditions that would be considered unenviable even in the nineteenth century.
30
25
344
@_jasonwei
Jason Wei
2 years
Hot take supported by evidence: for a given NLP task, it is unwise to extrapolate performance to larger models because emergence can occur. I manually examined all 202 tasks in BIG-Bench, and the most common category was for the scaling behavior to *unpredictably* increase.
Tweet media one
14
56
348