Tianjun Zhang @tianjun_zhang X Profile

Tianjun Zhang

@tianjun_zhang

Followers

2K

Following

991

Media

23

Statuses

153

Project Lead of LiveCodeBench, RAFT and Gorilla LLM, PhD student @berkeley_ai

https://t.co/Ul4tg8eDYB

California, USA

Joined March 2017

Don't wanna be here? Send us removal request.

Tianjun Zhang

@tianjun_zhang

1 year

It has been really a rewarding journey since I joined the #LLaMA3 team @AIatMeta a little more than 2 months ago, and yet today we are releasing one of the world's best models! 🔥With the new license, we allow synthetic data generation from Llama to enhance your own model!

11

8

116

Agentica Project

@Agentica_

7 months

Introducing DeepCoder-14B-Preview - our fully open-sourced reasoning model reaching o1 and o3-mini level on coding and math. The best part is, we’re releasing everything: not just the model, but the dataset, code, and training recipe—so you can train it yourself!🔥 Links below:

23

206

880

Tianjun Zhang

@tianjun_zhang

7 months

Proud to share what we have built! Tops the @lmarena_ai leaderboard with only 17B parameters. Huge wing for the open source! Enjoy 😉

Ahmad Al-Dahle

@Ahmad_Al_Dahle

7 months

Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4

1

0

29

Yuxiao Qu

@QuYuxiao

8 months

🚨 NEW PAPER: "Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning"! 🤔 With all these long-reasoning LLMs, what are we actually optimizing for? Length penalties? Token budgets? We needed a better way to think about it! Website: https://t.co/G5JTmryx0d 🧵[1/9]

6

63

312

Wayne Chi

@iamwaynechi

8 months

What do developers 𝘳𝘦𝘢𝘭𝘭𝘺 think of AI coding assistants? In October, we launched @CopilotArena to collect user preferences on real dev workflows. After months of live service, we’re here to share our findings in our recent preprint. Here's what we have learned /🧵

lmarena.ai

@arena

1 year

Introducing Copilot Arena - Interactive coding evaluation in the wild. Our extension lets you test top models for free, right in VSCode. Let's vote and build the Copilot leaderboard! Download here: https://t.co/Zyc9iL3u9m Led by @iamwaynechi and @valeriechen_ at CMU. 1/🧵

2

38

161

Simon Guo

@simonguozirui

8 months

LLMs for GPU kernel🌽generation have been getting Pop🍿ular since our preview last Dec; excited to announce 📢 our full paper 📃 for KernelBench! Turns out KernelBench is quite challenging 🧠 — frontier models outperform the PyTorch Eager baseline <20% of the time. More 🧵👇

9

79

310

Brandon Trabucco

@brandontrabucco

9 months

With the success of LLM agents like OpenAI Operator, we are entering a new scaling era, but how do we train these agent models? We present InSTA, the largest training environment for LLM agents, containing live web navigation tasks for 150k diverse websites in multiple

9

31

161

Agentica Project

@Agentica_

9 months

✨RL magic is in the air! Introducing DeepScaleR-1.5B-Preview—a fully open-source, 1.5B-parameter model trained with RL to surpass o1-preview for general math reasoning. 📜Blog: https://t.co/eHqApwRfnH 💻Github: https://t.co/tRsDN7xV4M

16

50

151

Tianjun Zhang

@tianjun_zhang

9 months

Magic of RL! You don’t need super large models to develop such behavior! Congrats @jiayi_pirate!

Jiayi Pan

@jiayi_pirate

9 months

We reproduced DeepSeek R1-Zero in the CountDown game, and it just works Through RL, the 3B base LM develops self-verification and search abilities all on its own You can experience the Ahah moment yourself for < $30 Code: https://t.co/B2IsN1PrXV Here's what we learned 🧵

1

0

4

Tianjun Zhang

@tianjun_zhang

11 months

Congrats to @OpenAI on the impressive performance of o1 model! Seems o1 already achieves 76% on LoveCodeBench, how should we improve it to make it harder🤔🤔

Noam Brown

@polynoamial

11 months

.@OpenAI o1 has started rolling out to the API!

2

0

14

Tianjun Zhang

@tianjun_zhang

11 months

I will be at #NeurIPS2024 this week! Happy to chat about large scale RL for reasoning and agents!

0

18

Tianjun Zhang

@tianjun_zhang

1 year

Check the new video arena! Pick your favorite video🚀

Video Arena

@aivideoarena

1 year

🚀 Just Launched: VideoArena!🎥 Discover head-to-head comparisons of video clips generated from the same prompts across top text-to-video models. Compare outputs from 7 leading models and we're adding more soon! 🔗 Check out the leaderboard: https://t.co/IkQTFB7am5 #Text2Video

1

0

6

Tianjun Zhang

@tianjun_zhang

1 year

Check out the amazing BFCL V2!

Shishir Patil

@shishirpatil_

1 year

🚀Excited to announce the release of BFCL V2 • Live! 🏆 As LLMs evolve into intelligent agents, the Berkeley Function-Calling Leaderboard (BFCL) is leading the way in evaluating their real-world function-calling capabilities. V2 • Live features 📢 enterprise-contributed data,

0

8

Aviral Kumar

@aviral_kumar2

1 year

Two new papers on self-improvement: paper 1 today ⬇️ In RISE, we build on online imitation to teach LLMs *how* to improve their own responses *sequentially*. w/ Llama2/3/Mistral, this gives solid +10-20% in 5 turns, outperforms parallel sampling! https://t.co/hVy3T1ZoGi 🧵⬇️

1

28

125

Tianjun Zhang

@tianjun_zhang

1 year

@AIatMeta And we use Berkeley Function Calling Leaderboard for evaluation! Congrats to my colleagues @shishirpatil_ @charlie_jcj02 @HuanzhiMao @profjoeyg Ion Stoica @fanjia_yan🫡

0

1

8

Rohan Paul

@rohanpaul_ai

1 year

This paper claims that Llama3-8B+BoT (Buffer of Thoughts) has the potential to surpass Llama3-70B model. 🤯 'Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models' - Propose buffer-manager to dynamically update the meta-buffer, thus enhancing the capacity

10

93

621

elvis

@omarsar0

1 year

Thought-Augmented Reasoning with LLMs Presents a thought-augmented reasoning approach, Buffer of Thoughts, to enhance the accuracy, efficiency, and robustness of LLM-based reasoning. It leverages a meta-buffer containing high-level thoughts (thought templates) distilled from

13

97

443

Tianjun Zhang

@tianjun_zhang

1 year

🤔why LLMs can only follow 1 thought template (e.g., CoT)? In our paper, LLMs can select their own thought process flexibly! Big improvement on agentic tasks! 🎉

Ling Yang

@LingYang_PU

1 year

Excited to introduce our new prompting method on LLMs, Buffer of Thoughts (BoT), collaborating with @tianjun_zhang at @berkeley_ai. Notably, Llama3-8B+BoT can beat Llama3-70B on reasoning tasks. Paper: https://t.co/M4KjqlhiyZ Code: https://t.co/DMaUcu8IOi

1

2

9

Aran Komatsuzaki

@arankomatsuzaki

1 year

Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models Significant performance improvements over previous SOTA methods: 11% on Game of 24, 20% on Geometric Shapes and 51% on Checkmate-in-One. repo: https://t.co/wMT8p9h5sW abs: https://t.co/PpENCDeNTN

3

32

165