Jason Wei @_jasonwei profile

Jason Wei

@_jasonwei

Followers

84K

Following

8K

Media

128

Statuses

1K

ai researcher @openai

Joined October 2020

Don't wanna be here? Send us removal request.

Jason Wei

@_jasonwei

6 months

Kickstarting my career in comedy after all this AGI stuff is over. And yes, this was actually the first time Sam, Hyung Won, and Max heard the joke.

OpenAI

@OpenAI

6 months

And, a great dad joke.

73

57

1K

Jason Wei

@_jasonwei

19 days

The longest chain-of-thought / reasoning trace I have witnessed was almost twenty minutes long and involved crazy backtracking, constraint verification, and tool use. But in the end, my girlfriend decided to just go with the first outfit that she tried on.

107

268

7K

Jason Wei

@_jasonwei

5 months

Yall heard it from the man himself

165

470

4K

Jason Wei

@_jasonwei

1 year

My typical day as a Member of Technical Staff at OpenAI:.[9:00am] Wake up.[9:30am] Commute to Mission SF via Waymo. Grab avocado toast from Tartine.[9:45 am] Recite OpenAI charter. Pray to optimization Gods. Learn the Bitter Lesson.[10:00am] Meetings (Google Meet). Discuss how to.

147

304

4K

Jason Wei

@_jasonwei

2 years

Personal update: after two years at @Google Brain, I joined the #ChatGPT team at @OpenAI! .Excited to continue working on large language models and can't wait to see the impact of AI on society 🤖🌎🪄.

105

123

3K

Jason Wei

@_jasonwei

8 months

Super excited to finally share what I have been working on at OpenAI!. o1 is a model that thinks before giving the final answer. In my own words, here are the biggest updates to the field of AI (see the blog post for more details):. 1. Don’t do chain of thought purely via.

OpenAI

@OpenAI

8 months

We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math.

94

353

3K

Jason Wei

@_jasonwei

2 years

Turns out the ancient Chinese knew a lot about modern neural networks

61

468

3K

Jason Wei

@_jasonwei

1 year

An incredible skill that I have witnessed, especially at OpenAI, is the ability to make “yolo runs” work. The traditional advice in academic research is, “change one thing at a time.” This approach forces you to understand the effect of each component in your model, and.

102

211

2K

Jason Wei

@_jasonwei

9 months

Inspiring words from a young OpenAI engineer: “Why have I done well so far? I don’t think I’m smarter or more experienced than other people. But my competitive advantage is that I am willing to sit down and fully debug and completely understand code. I am willing to stay up late.

69

173

2K

Jason Wei

@_jasonwei

2 years

Yesterday I gave a lecture at @Stanford's CS25 class on Transformers! . The lecture was on how “emergent abilities” are unlocked by scaling up language models. Emergence is one of the most exciting phenomena in large LMs…. Slides:

28

309

2K

Jason Wei

@_jasonwei

2 years

One thing that I started doing at OpenAI is that I created a policy for myself to be *100% transparent* with my manager about everything. It seems obvious and weird to say aloud, but I bet most people don’t actually do this. But once I started doing it, I realized there are a lot.

61

166

2K

Jason Wei

@_jasonwei

5 months

Biggest unsolved problem in artificial intelligence is what to do in the 3-5 minutes when o3 is thinking.

201

61

2K

Jason Wei

@_jasonwei

2 years

Best AI skillset in 2018: PhD + long publication record in a specific area.Best AI skillset in 2023: strong engineering abilities + adapting quickly to new directions without sunk cost fallacy. Correct me if this is over-generalized, but this is what it seems like to me lately.

58

175

2K

Jason Wei

@_jasonwei

2 years

From @sama I learned a lot about clear thinking and challenging conventional beliefs. @gdb inspires me to work harder and be a good person. When I first joined, I was working at the office at 10pm on a friday. He was working next to me and made me feel welcome. So sad 😢.

27

75

2K

Jason Wei

@_jasonwei

2 years

One pattern I noticed is that great AI researchers are willing to manually inspect lots of data. And more than that, they build infrastructure that allows them to manually inspect data quickly. Though not glamorous, manually examining data gives valuable intuitions about the.

44

201

2K

Jason Wei

@_jasonwei

5 months

o3 is very performant. More importantly, progress from o1 to o3 was only three months, which shows how fast progress will be in the new paradigm of RL on chain of thought to scale inference compute. Way faster than pretraining paradigm of new model every 1-2 years.

Noam Brown

@polynoamial

5 months

We announced @OpenAI o1 just 3 months ago. Today, we announced o3. We have every reason to believe this trajectory will continue.

58

209

2K

Jason Wei

@_jasonwei

2 years

Large language models are notoriously hard to evaluate because (1) they are highly multi-task, (2) they generate long completions, and (3) grading is subjective. After spending ~5 months rigorously working on how to do language model evals, this is my verdict:

51

164

2K

Jason Wei

@_jasonwei

2 years

I’m hearing chatter of PhD students not knowing what to work on. My take: as LLMs are deployed IRL, the importance of studying how to use them will increase. Some good directions IMO (no training):.1. prompting.2. evals.3. LM interfaces.4. safety.5. understanding LMs.6. emergence.

51

278

2K

Jason Wei

@_jasonwei

3 years

Today I acquired the "Research Scientist" title at Google Brain! . (previously: software/research engineer). At 23 years old, does this make me the youngest Research Scientist at Google?.

68

25

2K

Jason Wei

@_jasonwei

4 months

Magic is what happens when an unstoppable RL optimization algorithm powered by sufficient compute meets an unhackable RL environment.

106

141

2K

Jason Wei

@_jasonwei

8 months

o1-mini is the most surprising research result i've seen in the past year. obviously i cannot spill the secret, but a small model getting >60% on AIME math competition is so good that it's hard to believe. congrats @ren_hongyu @shengjia_zhao for the great work!.

33

90

2K

Jason Wei

@_jasonwei

1 year

It was an honor to give a guest lecture yesterday at Stanford’s CS330 class, "Deep Multi-Task and Meta-Learning"!. I discussed a few very simple intuitions for how I personally think about large language models. Slides: Here are the six intuitions: . (1)

22

276

2K

Jason Wei

@_jasonwei

6 months

There is a nuanced but important difference between chain-of-thought before and after o1. Before the o1 paradigm (i.e., chain-of-thought prompting), there was a mismatch between what chain of thought was and what we wanted it to be. We wanted chain of thought to reflect the.

52

177

2K

Jason Wei

@_jasonwei

2 years

Moving from Google Brain to OpenAI, one of the biggest changes for me was the shift from doing individual/small-group research to working on a team with several dozen people. Specifically, working on a bigger team has led me to think more about UX for researchers. Some examples:.

24

151

1K

Jason Wei

@_jasonwei

2 years

Enjoyed visiting UC Berkeley’s Machine Learning Club yesterday, where I gave a talk on doing AI research. Slides: In the past few years I’ve worked with and observed some extremely talented researchers, and these are the trends I’ve noticed:. 1. When.

45

264

1K

Jason Wei

@_jasonwei

6 months

Prediction: within the next year there will be a pretty sharp transition of focus in AI from general user adoption to the ability to accelerate science and engineering. For the past two years it has been about user base and general adoption across the public. This is very.

85

174

1K

Jason Wei

@_jasonwei

2 years

Yann LeCun is obviously a legend but I found this tweet to be quite misinformed. The whole point of "emergent abilities" such as few-shot prompting and chain-of-thought prompting, is that we clearly *did not* explicitly train or fine-tune them into the model. These abilities

103

144

1K

Jason Wei

@_jasonwei

2 years

OpenAI is nothing without its people.

29

69

1K

Jason Wei

@_jasonwei

5 months

2022: I never wrote a RL paper or worked with a RL researcher. I didn’t think RL was crucial for AGI. Now: I think about RL every day. My code is optimized for RL. The data I create is designed just for RL. I even view life through the lens of RL. Crazy how quickly life changes.

39

93

1K

Jason Wei

@_jasonwei

4 months

Very excited to finally share OpenAI's "deep research" model, which achieves twice the score of o3-mini on Humanity's Last Exam, and can even perform some tasks that would take PhD experts 10+ hours to do!. A few thoughts on the implications: Deep research can be seen as a new

46

189

1K

Jason Wei

@_jasonwei

2 years

It seems to be not a coincidence that some of the strongest leaders in AI who manage large teams frequently do very low-level technical work. Jeff Dean doing weekly IC (individual contributor) work while managing 3k+ people at Google Research is the canonical example, but I've.

26

146

1K

Jason Wei

@_jasonwei

5 months

Doing a hyperparameter sweep tonight on wagyu (thread)

36

25

1K

Jason Wei

@_jasonwei

2 years

I sometimes get questions on how to do good AI research, so I wrote a blog post about it: My personal take is that you can decompose research into four skills, each of which can be improved by practicing and knowing the right things to spend time on.

12

169

1K

Jason Wei

@_jasonwei

3 years

progression of prompt engineering research

19

115

1K

Jason Wei

@_jasonwei

1 year

Today I am pleased to announce the new board of directors for my relationship. The new board of directors will be:.1. My mom.2. My girlfriend’s sister.3. @hwchung27, who I pair program with frequently.4. Bret Taylor (we’ve only met once, but every board should have Bret Taylor).

41

24

994

Jason Wei

@_jasonwei

2 years

Pair programming isn’t standard at most companies and basically non-existent in academia, but I’ve been doing it with @hwchung27 for almost a year now. While it naively seems slower to code individually, I’ve realized that there are many benefits:. (1) In AI, what you work on can.

36

111

960

Jason Wei

@_jasonwei

3 months

Made this plot for an upcoming talk---crazy how quickly benchmarks get saturated these days. Looking forward to seeing how things play out for Humanity's Last Exam!

41

122

987

Jason Wei

@_jasonwei

2 years

I reached 10k citations recently, a goal of mine for many years. It’s a nice moment to reflect back, and I mostly feel bittersweet:. (1) When I joined Google Brain back in 2020, I thought I'd stay for 10+ years, doing open-ended research and publishing papers. But the field has.

16

70

942

Jason Wei

@_jasonwei

3 months

Seems to be not a coincidence that what AI is good at is correlated with the backgrounds of AI researchers. Demis was a chess prodigy; Jakub (Chief Scientist) and Mark (CRO) of OpenAI were competitive programmers; many IMO medalists at OpenAI, x-ai. If our world initialized with.

92

71

941

Jason Wei

@_jasonwei

3 years

I asked GPT-3 what are "fun things to do in Mountain View" and it returned nothing. I thought the Open API must be broken but then i realized that is the correct answer.

25

35

869

Jason Wei

@_jasonwei

7 months

Excited to open-source a new hallucinations eval called SimpleQA! For a while it felt like there was no great benchmark for factuality, and so we created an eval that was simple, reliable, and easy-to-use for researchers. Main features of SimpleQA:. 1. Very simple setup: there

29

125

867

Jason Wei

@_jasonwei

2 years

One lesson that I learned from moving to OpenAI (which is applicable to changing companies generally) is that the opportunity to reinvent myself and adapt to a new optimization landscape can be fun. When I was at Google Brain from 2020-2022, my optimization was simply writing.

21

56

840

Jason Wei

@_jasonwei

1 year

My mental model of Sora is that it is the “GPT-2 moment” for video generation. GPT-2, which came out in 2018, could generate paragraphs of text that are coherent and grammatically correct. GPT-2 wasn’t able to write an entire essay without making mistakes like being inconsistent.

14

119

807

Jason Wei

@_jasonwei

3 years

There's a skill I acquired early in my career that I wish I could unlearn. The skill is being able to package mediocre or uninteresting results into a paper that could be accepted to a top conference. Time spent doing that is time not spent doing the research that matters.

20

49

738

Jason Wei

@_jasonwei

5 months

Law of reinforcement learning: if you can measure it, you can optimize it.

27

48

759

Jason Wei

@_jasonwei

2 years

In the past weeks I received many questions (from undergrads especially) about AI research, so I'm putting together a "Ask Me Anything" doc. Add any questions to the doc, I'll answer all of them: Yes, I'll actually answer them all, because writing answers.

17

142

727

Jason Wei

@_jasonwei

2 years

AI moves fast, which means incumbents don't have a big advantage over new-joiners. For example, no one has >4 years of experience at prompting. Even 1k hours of practice makes you a world-class prompt engineer. This is not true for other fields (e.g., theoretical math/physics).

29

82

694

Jason Wei

@_jasonwei

4 months

Nice paper from Deepmind takes a fresh angle on factuality: While most existing factuality datasets focus on public world knowledge, this paper evaluates whether responses are consistent with a provided document as context. This is an elegant and.

18

113

721

Jason Wei

@_jasonwei

5 months

An underrated but occasionally make-or-break skill in AI research (that didn’t really exist ten years ago) is the ability to find a dataset that actually exercises a new method you are working on. Back in the day when the bottleneck in AI was learning, many methods were.

21

74

710

Jason Wei

@_jasonwei

2 years

My girlfriend doesn’t like the weekend plans I make for us, but she also doesn’t want to make plans herself. Instead, I should propose multiple schedules and then she picks one she likes. So I said she is like a Reward Model in RLHF, and I am like a Policy Model (with a low LR).

31

14

662

Jason Wei

@_jasonwei

11 months

As a kid I loved whiteboard lectures way more than slides, so for Stanford’s CS25 class I gave a whiteboard lecture!. My goal was to simply and clearly explain why language models work so well, purely via intuitions. Youtube video: (w/ @hwchung27).

7

98

654

Jason Wei

@_jasonwei

6 months

Very excited for o1 to come out from preview! . Main takeaways:.- o1 thinks harder and is more performant on tough problems.- o1 thinks faster on easy problems.- o1 reasons over image and text.- o1 can think *even* harder when using o1-pro mode. o1 was indeed a saga, with many.

OpenAI

@OpenAI

6 months

OpenAI o1 is now out of preview in ChatGPT. What’s changed since the preview? A faster, more powerful reasoning model that’s better at coding, math & writing. o1 now also supports image uploads, allowing it to apply reasoning to visuals for more detailed & useful responses.

34

57

634

Jason Wei

@_jasonwei

1 month

New benchmark for deep research agents! An agent that is creative and persistent should be able to find any piece of information on the open web, even if it requires browsing hundreds of webpages. Models that exercise this ability are like a frictionless interface to the

22

64

639

Jason Wei

@_jasonwei

1 year

One liberating thing about OpenAI (and presumably other small companies) is that there are no expectations of project scope being tied to your level. What I mean is that an ambitious junior engineer could take on a big project and be judged purely on their execution, without any.

23

49

621

Jason Wei

@_jasonwei

2 years

I recently dug up my work log from when I was an AI Resident at Google Brain and found some amusing nuggets:. - I used to block out specific periods of time to code while wearing a full three-piece suit. (Yes, wearing the suit did improve my productivity.).- Every Saturday night,.

19

20

600

Jason Wei

@_jasonwei

2 months

I am super excited for AI for scientific innovation, a direction that will certainly grow in the next five years. I think there will be two flavors of it. The first is “deepmind style”, where there is a very specific, important problem to solve (e.g., protein-folding), and you.

19

72

622

Jason Wei

@_jasonwei

2 years

My 2023 goals:.- Spend 1,000 hours writing code (context: 377 hrs in 2022, 886 hrs in 2021).- Publish 0-2 first-author papers, but not more than 2.- Write 50 thoughtful tweets.- Do 150 workouts.

19

20

595

Jason Wei

@_jasonwei

3 years

New survey paper! We discuss “emergent abilities” of large language models. Emergent abilities are only present in sufficiently large models, and thus they would not have been predicted simply by extrapolating the scaling curve from smaller models. 🧵⬇️

14

123

570

Jason Wei

@_jasonwei

2 years

I gave an invited lecture at New York University for @hhexiy's class!. I covered three ideas driving the LLM revolution: scaling, emergence, and reasoning. I tried to frame them in a way that reveals why large LMs are special in the history of AI. Slides:

9

152

567

Jason Wei

@_jasonwei

2 years

I tried to give this talk in the spirit of "a college soccer player watches videos of Messi and analyzes what makes him such a great soccer player." . IMO it's great for people to aspire for greatness in AI research, just like in sports. Sorry you took it personally.

40

17

552

Jason Wei

@_jasonwei

2 years

Lost my waymo virginity last night; it was the first time I felt like SF is a futuristic city (though also dystopian, with all the homelessness around). I don’t think I’ll take human taxis again if I have the choice. Self-driving taxis drive more smoothly and are cleaner (and.

27

32

546

Jason Wei

@_jasonwei

1 year

For most companies, hiring more people is strictly better. However, this is often not true in AI research. AI research is often bottlenecked by compute, and when this is the case, hiring more researchers can be counter-productive. I remember back at Google Brain, my manager once.

31

41

553

Jason Wei

@_jasonwei

5 months

Reflecting back, these were the biggest technical lessons for me in AI in the past five years:. 2020: you can cast any language task as sequence prediction and learn it via pretrain + finetune. 2021: scaling to GPT-3 size enables doing arbitrary tasks specified via instructions.

33

52

560

Jason Wei

@_jasonwei

2 years

Paper writing bro tip:. I like to include a "Frequently Asked Questions" (FAQ) section in the Appendix. I always get positive feedback on it, but don’t see anyone else doing it. Great for appeasing reviewers, and detailed readers to refer to. I've been doing it since 2019…

11

47

544

Jason Wei

@_jasonwei

1 year

Had a bit of a fanboy moment today meeting @bryan_johnson, who has been super inspirational to me in prioritizing my health. I asked him about the best way to balance career and spending time on health. His advice is that while many people give up sleep to work more, sleeping

16

33

543

Jason Wei

@_jasonwei

1 year

New blog post where I discuss what makes an language model evaluation successful, and the "seven sins" that make hinder an eval from gaining traction in the community: Had fun presenting this at Stanford's NLP Seminar yesterday!

14

78

534

Jason Wei

@_jasonwei

2 years

Hot take: what if Google Scholar reported two new metrics: (1) median citations per paper and (2) *percent* of papers with 100+ citations?. I computed these metrics for some ~200 senior AI researchers: see The top researchers by median citations per paper.

37

65

515

Jason Wei

@_jasonwei

3 years

"So do you speak Chinese?". Normal person: "I can understand but I don't speak well". @YiTayML: "my encoder is OK but my decoder is broken" 🤦.

6

20

513

Jason Wei

@_jasonwei

5 months

Realization: the old style of “hallucinations research” via self-calibration is probably going to die down. I used to be very excited about it but now I am skeptical because giving models internet access (e.g., searchGPT, perplexity) is turning out to be way higher ROI. When.

28

49

524

Jason Wei

@_jasonwei

1 year

One thing in AI research that I have finally recognized with clarity is the idea of “inertia bias”: continuing to do something when it’s not the best option. The most basic instance of inertia bias is the feeling of “I already spent time implementing X, so let me continue trying.

15

69

498

Jason Wei

@_jasonwei

2 years

For any new prompting technique (e.g., tree-of-thought, least-to-most, graph-of-thoughts prompting), I consider four things to decide if it will become widely adopted:. 1. How easy is it to implement.2. How much compute it use.3. How many tasks does it improve.4. How much does it.

17

61

488

Jason Wei

@_jasonwei

4 months

Somewhat meta but there is a dopamine cycle in doing AI research that is pretty interesting. Every day you wake up and you think about what experiment to run. You think thing X matters so you decide to improve it or ablate it. Then you write the code and pay some compute to find.

26

36

488

Jason Wei

@_jasonwei

1 year

Like the International Math Olympiad or Spelling Bee, there should be a “language modeling competition” where humans compete to predict the next word in a sequence. The best humans would probably still lose to GPT-2, and we’d have more empathy for how hard it is to be an LLM :).

33

27

466

Jason Wei

@_jasonwei

1 year

In AI research there is tremendous value in intuitions on what makes things work. In fact, this skill is what makes “yolo runs” successful, and can accelerate your team tremendously. However, there’s no track record on how good someone’s intuition is. A fun way to do this is.

20

36

467

Jason Wei

@_jasonwei

2 years

IMO GPT-4 is a bigger leap than GPT-3 was. - GPT-3 advanced AI from task-specific models to a single prompted model that is task-general.- GPT-4 is human-level on many hard tasks, and will signal a *societal* revolution where AI reaches every industry, starting with technology 🧵.

OpenAI

@OpenAI

2 years

Announcing GPT-4, a large multimodal model, with our best-ever results on capabilities and alignment:

9

78

457

Jason Wei

@_jasonwei

3 years

Meta's open source OPT-175 is comparable to GPT-3 175B. This is a massive step forward for bringing big LM research to academia. Expected nothing less from @LukeZettlemoyer and crew :).

9

75

437

Jason Wei

@_jasonwei

1 year

One of the great pleasures in life is waking up and immediately going to my computer to check the results of experiments I launched last night.

9

27

437

Jason Wei

@_jasonwei

2 years

An open question these days is what the role of task-specific / finetuned models will be. I can only think of three scenarios where it makes sense to work on task-specific models. The first scenario is if you have private (e.g., legal, medical, business) data not found on the.

44

68

434

Jason Wei

@_jasonwei

3 years

The year is 2017. I am training deep neural nets. I do hyperparameter tuning - it is partially science, but mostly black magic. The year is 2022. I am prompting large language models. I do prompt engineering - it is partially science, but mostly black magic.

8

35

415

Jason Wei

@_jasonwei

2 years

The highest possible compliment in our field is people not being on their laptops while youre giving a talk.

10

14

418

Jason Wei

@_jasonwei

6 months

Andrej’s tweet is the right way to think about it right now but I totally believe that in one or two years we will start relying on AI for very challenging decisions like diagnosing disease under limited information. Key thing to note here is that big decisions can be viewed as a.

Andrej Karpathy

@karpathy

6 months

People have too inflated sense of what it means to "ask an AI" about something. The AI are language models trained basically by imitation on data from human labelers. Instead of the mysticism of "asking an AI", think of it more as "asking the average data labeler" on the.

25

50

426

Jason Wei

@_jasonwei

2 years

A somewhat amusing personal revelation: a friend recently asked me what were the best skills I had. I think he expected me to say something like "prompt engineering", "writing papers", or "making evals." But my answer surprised him a bit: I said prioritization and communication.

15

48

417

Jason Wei

@_jasonwei

1 year

nothing gets my heart rate up like waiting for eval results on new models to come in.

20

27

416

Jason Wei

@_jasonwei

3 years

@GoogleAI The reasoning ability of PaLM 540B (prompting only!) is simply remarkable.

7

48

410

Jason Wei

@_jasonwei

1 year

Enjoyed this extremely comprehensive study on predicting language model performance Found many insightful nuggets:.- In a single model family there usually aren't that many model sizes, which hinders predictive power. However, there are many model

5

68

418

Jason Wei

@_jasonwei

2 years

Since GPT-4, some have argued that emergence in LLMs is overstated, or even a "mirage". I don't think these arguments debunk emergence, but they warrant discussion (it's generally good to examine scientific phenomena critically). A blog post: 🧵⬇️

9

93

395

Jason Wei

@_jasonwei

2 years

🩷.

Sam Altman

@sama

2 years

i love the openai team so much.

6

16

387

Jason Wei

@_jasonwei

5 months

Throughout my time at OpenAI I have found Hongyu Ren to be absolutely ruthless. No mercy for evaluation benchmarks whatsoever: o3-mini is 83% on AIME, 2000+ codeforces elo. Every o*-mini model is so performant and fast. Congrats @ren_hongyu @shengjia_zhao and crew!.

Hongyu Ren

@ren_hongyu

5 months

o3-mini is here! Together with @shengjia_zhao, @_kevinlu, @max_a_schwarzer, @ericmitchellai, @brian_zq, @sandersted and many others, we trained this efficient reasoning model, maximally compressing the intelligence from big brothers o1 / o3. The model is very good in hard

7

21

392

Jason Wei

@_jasonwei

3 years

Three facts about the new UL2 model:. 1. Checkpoint is public. 2 Beats GPT-3 on superglue for zero-shot. 3. Smallest model I know of that can do chain-of-thought reasoning on arbitrary tasks. 📜 Open source for the win!.

Google AI

@GoogleAI

3 years

Introducing UL2, a novel language pre-training paradigm that improves performance of language models across datasets and setups by using a mixture of training objectives, each with different configurations. Read more and grab model checkpoints at

7

66

378

Jason Wei

@_jasonwei

3 years

Big language models can generate their own chain of thought, even without few-shot exemplars. Just add "Let's think step by step". Look me in the eye and tell me you don't like big language models.

15

60

378

Jason Wei

@_jasonwei

3 years

I combed through the large language model literature and made a repository of 137 “emergent abilities”, which are only present in sufficiently-large language models. 100+ emergent tasks can be found in BIG-Bench and MMLU alone. Scaling seems to work.

20

62

376

Jason Wei

@_jasonwei

2 years

People always ask if prompt engineering is going to go away over time. My short answer is "no". But, a more nuanced answer is that the goal of prompt engineering has evolved over time: from nudging a finnicky language model to do an "easy" task (2020/2021) to figuring out how to.

11

69

372

Jason Wei

@_jasonwei

2 years

Honored to have played a small role in Med-PaLM, now published in Nature!. My grandpa is 89, and every year he asks if I've published in Science or Nature (he knows I'm a researcher, but he only knows those two journals). This year I can finally say yes!.

9

29

373

Jason Wei

@_jasonwei

6 months

Very clever paper to predict downstream performance of pre-trained models. - Take advantage of the fact that larger models being more sample-efficient and performant is equivalent to finetuning a smaller model on targeted data.- In principle this is hugely valuable because you.

Charlie Snell

@sea_snell

6 months

Can we predict emergent capabilities in GPT-N+1🌌 using only GPT-N model checkpoints, which have random performance on the task?. We propose a method for doing exactly this in our paper “Predicting Emergent Capabilities by Finetuning”🧵

8

48

369

Jason Wei

@_jasonwei

8 months

22 minute video of OpenAI researchers talking about their experiences working on strawberry.@woj_zaremba is very fun to work with.@MillionInt is a legend.

OpenAI

@OpenAI

8 months

Extended Cut:

2

26

360

Jason Wei

@_jasonwei

3 years

It's unclear to me why "novelty" is a reviewing criteria at ML conferences. Many simple but not particularly novel methods have been very impactful (ELMo, RoBERTa, UDA, T0, etc). Novelty is often conflated with complexity, which has no inherent value. Incentives are important.

10

21

350

Jason Wei

@_jasonwei

3 months

We do not rise the power of our RL optimization algorithms—we fall to the hackability of our RL environment.

12

26

328

Jason Wei

@_jasonwei

2 years

Really cool paper studying the faithfulness of chain-of-thought (CoT): The paper uses a biased prompt to try to mislead the model. For example, all few-shot exemplars could be answer (A), or they could add a suffix such as "I think the answer is <X> but

5

77

360

Jason Wei

@_jasonwei

2 years

SF is the only city in the world where you can ride a self-driving car, surely a pinnacle of generations of technological progress, and look out the window to see rows of homeless people living in conditions that would be considered unenviable even in the nineteenth century.

30

25

344

Jason Wei

@_jasonwei

2 years

Hot take supported by evidence: for a given NLP task, it is unwise to extrapolate performance to larger models because emergence can occur. I manually examined all 202 tasks in BIG-Bench, and the most common category was for the scaling behavior to *unpredictably* increase.

14

56

348