Evan Wang Profile
Evan Wang

@evanzwangg

Followers
918
Following
822
Media
7
Statuses
29

post-training/reasoning @xAI prev @Caltech @scale_AI @weHRTyou @umdcs

Joined March 2020
Don't wanna be here? Send us removal request.
@evanzwangg
Evan Wang
2 months
you love to see it 🐦‍⬛
@xai
xAI
2 months
Introducing Grok 4 Fast, a multimodal reasoning model with a 2M context window that sets a new standard for cost-efficient intelligence. Available for free on https://t.co/AnXpIEOhOD, https://t.co/53pltypvkw, iOS and Android apps, and OpenRouter. https://t.co/3YZ1yVwueV
0
0
12
@evanzwangg
Evan Wang
3 months
🐆 💨 we do a bit of coding
@xai
xAI
3 months
Introducing Grok Code Fast 1, a speedy and economical reasoning model that excels at agentic coding. Now available for free on GitHub Copilot, Cursor, Cline, Kilo Code, Roo Code, opencode, and Windsurf. https://t.co/3tMbmLbxOP
13
7
235
@evanzwangg
Evan Wang
4 months
✌️
@nearlydaniel
Daniel
4 months
War Room squad locked in
7
1
26
@evanzwangg
Evan Wang
4 months
good stuff grok 🚀 https://t.co/4mfdh8X01S
0
1
16
@HavaeiRez
Rez Havaei
8 months
LLMs are being deployed in high-stakes environments—and the potential impact of failure is colossal. A jailbroken AI could leak your customer data, financial records, or enable catastrophically harmful actions. At @gen_analysis we have compiled the definitive guide to understand
6
27
72
@evanzwangg
Evan Wang
10 months
Delighted to announce that PlanSearch has been accepted to ICLR 2025!! 😁😁 see you in singapore 🫡
@evanzwangg
Evan Wang
1 year
A 20% boost on a metric is rare, especially when it’s code generation 🥱 PlanSearch, our new search method based on diverse plans, outperforms baselines by huge margins. It's not just a search method, but also a philosophy How are these numbers achieved? can they be predicted?
5
9
57
@evanzwangg
Evan Wang
1 year
thanks for having me!! was great seeing everyone again 😁
@furongh
Furong Huang
1 year
Our very own @evanzwangg visited us back at UMD today and gave an awesome talk. Check out his paper here to see how planning improves pass@k significantly for coding problems: https://t.co/rE505uLqsS
1
0
11
@goodside
Riley Goodside
1 year
RLHF and instruction tuning reduce diversity in LLM output, limiting the value of inference-time search. PlanSearch, from research at @scale_AI, restores this diversity using combinatorial samples of "observations" to form plans for coding problems, yielding strong gains across
@hughbzhang
Hugh Zhang
1 year
Enabling LLMs to reason more deeply at inference time via search is one of the most exciting directions in AI right now. We introduce PlanSearch, a novel method for code generation that searches over high-level "plans" in natural language as a means of encouraging diversity.
1
15
112
@alexandr_wang
Alexandr Wang
1 year
New SOTA test-time compute result from Scale SEAL⚡️ We are releasing a new SOTA test-time compute method called PlanSearch. It meaningfully outperforms existing approaches on LiveCodeBench via a new diversity-based search method See more about our SEAL open research below:
@hughbzhang
Hugh Zhang
1 year
Enabling LLMs to reason more deeply at inference time via search is one of the most exciting directions in AI right now. We introduce PlanSearch, a novel method for code generation that searches over high-level "plans" in natural language as a means of encouraging diversity.
11
25
208
@hughbzhang
Hugh Zhang
1 year
Enabling LLMs to reason more deeply at inference time via search is one of the most exciting directions in AI right now. We introduce PlanSearch, a novel method for code generation that searches over high-level "plans" in natural language as a means of encouraging diversity.
16
99
637
@evanzwangg
Evan Wang
1 year
@hughbzhang @ellev3n11 @squeakymouse777 @vaskar_n @SeanHendryx @summeryue0 This was all possible through collaboration with @hughbzhang, @ellev3n11 , @squeakymouse777, Yunfeng, Will, @vaskar_n, Ziwen, @SeanHendryx , @summeryue0 and of course @scale_AI was a great summer and would gladly do it again!
1
0
12
@evanzwangg
Evan Wang
1 year
@hughbzhang @ellev3n11 @squeakymouse777 @vaskar_n @SeanHendryx @summeryue0 We found that current models lack diversity out-of-the-box, making effective inference-time compute hard. Searching in idea space somewhat alleviates this issue. In the long term, we imagine combining these immense p@k gains with training to distill the gains into p@1, natively😇
1
0
6
@evanzwangg
Evan Wang
1 year
@hughbzhang @ellev3n11 @squeakymouse777 @vaskar_n @SeanHendryx @summeryue0 Finally, even though we optimize our methods to be ‘attempt-efficient’ (if you had 2 attempts, how would you make these attempts as good as possible), we check compute-efficiency as well even though we use 6.5x as many generated tokens, PlanSearch still scales better 📈
1
1
13
@evanzwangg
Evan Wang
1 year
@hughbzhang @ellev3n11 @squeakymouse777 @vaskar_n @SeanHendryx @summeryue0 Even a simple filtering like submitting only those passing public tests brings p@8k -> p@k, which is HUGE… So p@1 of filtering = p@8 with search ✅ Another example: base models much more diverse than instruct. The paradigm on the base model p@1 is much better than instruct p@1.
2
0
10
@evanzwangg
Evan Wang
1 year
@hughbzhang @ellev3n11 @squeakymouse777 @vaskar_n @SeanHendryx @summeryue0 These giant improvements at large k can be BROUGHT BACK to low k through filtering, which picks promising sols from a pool of sols. We argue for a paradigm that optimizes diversity to sacrifice p@1 for huge p@k gains, then uses filtering to bring those p@k gains back to low k
1
1
17
@evanzwangg
Evan Wang
1 year
@hughbzhang @ellev3n11 @squeakymouse777 @vaskar_n @SeanHendryx @summeryue0 this is how we get so much diversity 🤩 Even though we may sacrifice our pass@1 a small bit, our pass@k is much, much better. Our best model gets almost DOUBLE the raw pass@1, and drastically outperforms other baselines like CoT
1
0
15
@evanzwangg
Evan Wang
1 year
@hughbzhang @ellev3n11 @squeakymouse777 @vaskar_n @SeanHendryx @summeryue0 objectives like RLHF are known to reduce diversity at train-time. we inject back more diversity through PlanSearch. how it works: We generate layer 1 of observations, and selectively mix these to create the next layer. These generate the solution sketches, and then the code.
1
1
29