Terry Yue Zhuo
@terryyuezhuo
Followers
2K
Following
9K
Media
149
Statuses
1K
@BigCodeProject-{⚔️Arena, 📊Bench} | Going Stealth | @codelm_tutorial EMNLP’25
Joined May 2020
It’s so much fun working with the other 39 community members on this project! Start to try out various frontier models in BigCodeArena today.
Introducing BigCodeArena, a human-in-the-loop platform for evaluating code through execution. Unlike current open evaluation platforms that collect human preferences on text, it enables interaction with runnable code to assess functionality and quality across any language.
11
37
129
Basically that’s what I’ve been working on.
We disrupted a highly sophisticated AI-led espionage campaign. The attack targeted large tech companies, financial institutions, chemical manufacturing companies, and government agencies. We assess with high confidence that the threat actor was a Chinese state-sponsored group.
0
2
19
When models get stronger, the scaffoldings will be more simplified.
0
0
1
As models become more accessible, it’s impossible to completely prevent malicious use, so systems need to anticipate how ppl might use those models to attack them.
0
0
2
Let’s do a survey of those good and bad surveys.
0
0
3
GLM-4.6 is now live on BigCodeArena. Shout-out to @qinkai1028 and the whole @Zai_org team for this great model!
It’s so much fun working with the other 39 community members on this project! Start to try out various frontier models in BigCodeArena today.
1
2
26
BigCodeArena Unveiling More Reliable Human Preferences in Code Generation via Execution
1
10
62
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution "we introduce BigCodeArena, an open human evaluation platform for code generation backed by a comprehensive and on-the-fly execution environment. Built on top of Chatbot Arena, BigCodeArena
3
6
58
Vibe coding is small in scope but big in impact. It’s how you learn what actually feels good to build. Those quick React prototypes are just a small part of the bigger picture. Think bigger.
0
0
7
Just a classic one, but with PyGame this time 😛 "A ball bouncing inside a spinning hexagon, with the full control"
Introducing BigCodeArena, a human-in-the-loop platform for evaluating code through execution. Unlike current open evaluation platforms that collect human preferences on text, it enables interaction with runnable code to assess functionality and quality across any language.
0
2
7
It’s so much fun working with the other 39 community members on this project! Start to try out various frontier models in BigCodeArena today.
Introducing BigCodeArena, a human-in-the-loop platform for evaluating code through execution. Unlike current open evaluation platforms that collect human preferences on text, it enables interaction with runnable code to assess functionality and quality across any language.
11
37
129
Just a classic one, but with PyGame this time 😛 "A ball bouncing inside a spinning hexagon, with the full control"
Introducing BigCodeArena, a human-in-the-loop platform for evaluating code through execution. Unlike current open evaluation platforms that collect human preferences on text, it enables interaction with runnable code to assess functionality and quality across any language.
0
2
7
cc @altryne @ivanfioravanti @qinkai1028 @olafgeibig who were curious about this. Sorry for the delay!
2
0
3
Excited to have helped out on BigCodeArena led by @terryyuezhuo !
It’s so much fun working with the other 39 community members on this project! Start to try out various frontier models in BigCodeArena today.
1
2
18
Special thanks to @abidlabs @clefourrier from @huggingface team, @mlejva from @e2b, @hyperbolic_labs team, and @Alibaba_Qwen team!🤗
1
0
3
Introducing BigCodeArena, a human-in-the-loop platform for evaluating code through execution. Unlike current open evaluation platforms that collect human preferences on text, it enables interaction with runnable code to assess functionality and quality across any language.
4
29
79