tianjun_zhang Profile Banner
Tianjun Zhang Profile
Tianjun Zhang

@tianjun_zhang

Followers
2K
Following
991
Media
23
Statuses
153

Project Lead of LiveCodeBench, RAFT and Gorilla LLM, PhD student @berkeley_ai

California, USA
Joined March 2017
Don't wanna be here? Send us removal request.
@tianjun_zhang
Tianjun Zhang
1 year
It has been really a rewarding journey since I joined the #LLaMA3 team @AIatMeta a little more than 2 months ago, and yet today we are releasing one of the world's best models! 🔥With the new license, we allow synthetic data generation from Llama to enhance your own model!
11
8
116
@Agentica_
Agentica Project
7 months
Introducing DeepCoder-14B-Preview - our fully open-sourced reasoning model reaching o1 and o3-mini level on coding and math. The best part is, we’re releasing everything: not just the model, but the dataset, code, and training recipe—so you can train it yourself!🔥 Links below:
23
206
880
@tianjun_zhang
Tianjun Zhang
7 months
Proud to share what we have built! Tops the @lmarena_ai leaderboard with only 17B parameters. Huge wing for the open source! Enjoy 😉
@Ahmad_Al_Dahle
Ahmad Al-Dahle
7 months
Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4
1
0
29
@QuYuxiao
Yuxiao Qu
8 months
🚨 NEW PAPER: "Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning"! 🤔 With all these long-reasoning LLMs, what are we actually optimizing for? Length penalties? Token budgets? We needed a better way to think about it! Website: https://t.co/G5JTmryx0d 🧵[1/9]
6
63
312
@iamwaynechi
Wayne Chi
8 months
What do developers 𝘳𝘦𝘢𝘭𝘭𝘺 think of AI coding assistants? In October, we launched @CopilotArena to collect user preferences on real dev workflows. After months of live service, we’re here to share our findings in our recent preprint. Here's what we have learned /🧵
@arena
lmarena.ai
1 year
Introducing Copilot Arena - Interactive coding evaluation in the wild. Our extension lets you test top models for free, right in VSCode. Let's vote and build the Copilot leaderboard! Download here: https://t.co/Zyc9iL3u9m Led by @iamwaynechi and @valeriechen_ at CMU. 1/🧵
2
38
161
@simonguozirui
Simon Guo
8 months
LLMs for GPU kernel🌽generation have been getting Pop🍿ular since our preview last Dec; excited to announce 📢 our full paper 📃 for KernelBench! Turns out KernelBench is quite challenging 🧠 — frontier models outperform the PyTorch Eager baseline <20% of the time. More 🧵👇
9
79
310
@brandontrabucco
Brandon Trabucco
9 months
With the success of LLM agents like OpenAI Operator, we are entering a new scaling era, but how do we train these agent models? We present InSTA, the largest training environment for LLM agents, containing live web navigation tasks for 150k diverse websites in multiple
9
31
161
@Agentica_
Agentica Project
9 months
✨RL magic is in the air! Introducing DeepScaleR-1.5B-Preview—a fully open-source, 1.5B-parameter model trained with RL to surpass o1-preview for general math reasoning. 📜Blog: https://t.co/eHqApwRfnH 💻Github: https://t.co/tRsDN7xV4M
16
50
151
@tianjun_zhang
Tianjun Zhang
9 months
Magic of RL! You don’t need super large models to develop such behavior! Congrats @jiayi_pirate!
@jiayi_pirate
Jiayi Pan
9 months
We reproduced DeepSeek R1-Zero in the CountDown game, and it just works Through RL, the 3B base LM develops self-verification and search abilities all on its own You can experience the Ahah moment yourself for < $30 Code: https://t.co/B2IsN1PrXV Here's what we learned 🧵
1
0
4
@tianjun_zhang
Tianjun Zhang
11 months
Congrats to @OpenAI on the impressive performance of o1 model! Seems o1 already achieves 76% on LoveCodeBench, how should we improve it to make it harder🤔🤔
@polynoamial
Noam Brown
11 months
.@OpenAI o1 has started rolling out to the API!
2
0
14
@tianjun_zhang
Tianjun Zhang
11 months
I will be at #NeurIPS2024 this week! Happy to chat about large scale RL for reasoning and agents!
0
0
18
@tianjun_zhang
Tianjun Zhang
1 year
Check the new video arena! Pick your favorite video🚀
@aivideoarena
Video Arena
1 year
🚀 Just Launched: VideoArena!🎥 Discover head-to-head comparisons of video clips generated from the same prompts across top text-to-video models. Compare outputs from 7 leading models and we're adding more soon! 🔗 Check out the leaderboard: https://t.co/IkQTFB7am5 #Text2Video
1
0
6
@tianjun_zhang
Tianjun Zhang
1 year
Check out the amazing BFCL V2!
@shishirpatil_
Shishir Patil
1 year
🚀Excited to announce the release of BFCL V2 • Live! 🏆 As LLMs evolve into intelligent agents, the Berkeley Function-Calling Leaderboard (BFCL) is leading the way in evaluating their real-world function-calling capabilities. V2 • Live features 📢 enterprise-contributed data,
0
0
8
@aviral_kumar2
Aviral Kumar
1 year
Two new papers on self-improvement: paper 1 today ⬇️ In RISE, we build on online imitation to teach LLMs *how* to improve their own responses *sequentially*. w/ Llama2/3/Mistral, this gives solid +10-20% in 5 turns, outperforms parallel sampling! https://t.co/hVy3T1ZoGi 🧵⬇️
1
28
125
@tianjun_zhang
Tianjun Zhang
1 year
@AIatMeta And we use Berkeley Function Calling Leaderboard for evaluation! Congrats to my colleagues @shishirpatil_ @charlie_jcj02 @HuanzhiMao @profjoeyg Ion Stoica @fanjia_yan🫡
0
1
8
@rohanpaul_ai
Rohan Paul
1 year
This paper claims that Llama3-8B+BoT (Buffer of Thoughts) has the potential to surpass Llama3-70B model. 🤯 'Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models' - Propose buffer-manager to dynamically update the meta-buffer, thus enhancing the capacity
10
93
621
@omarsar0
elvis
1 year
Thought-Augmented Reasoning with LLMs Presents a thought-augmented reasoning approach, Buffer of Thoughts, to enhance the accuracy, efficiency, and robustness of LLM-based reasoning. It leverages a meta-buffer containing high-level thoughts (thought templates) distilled from
13
97
443
@tianjun_zhang
Tianjun Zhang
1 year
🤔why LLMs can only follow 1 thought template (e.g., CoT)? In our paper, LLMs can select their own thought process flexibly! Big improvement on agentic tasks! 🎉
@LingYang_PU
Ling Yang
1 year
Excited to introduce our new prompting method on LLMs, Buffer of Thoughts (BoT), collaborating with @tianjun_zhang at @berkeley_ai. Notably, Llama3-8B+BoT can beat Llama3-70B on reasoning tasks. Paper: https://t.co/M4KjqlhiyZ Code: https://t.co/DMaUcu8IOi
1
2
9
@arankomatsuzaki
Aran Komatsuzaki
1 year
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models Significant performance improvements over previous SOTA methods: 11% on Game of 24, 20% on Geometric Shapes and 51% on Checkmate-in-One. repo: https://t.co/wMT8p9h5sW abs: https://t.co/PpENCDeNTN
3
32
165