InfiniAILab Profile Banner
Infini-AI-Lab Profile
Infini-AI-Lab

@InfiniAILab

Followers
1K
Following
55
Media
46
Statuses
88

Pittsburgh, PA
Joined September 2024
Don't wanna be here? Send us removal request.
@chenzhuoming911
chen zhuoming
5 days
See you at the tutorial! 🎉 Scale Test-Time Compute on Modern Hardware ⚙️💻 with @BeidiChen @Azaliamirh 1:30 - 4pm, Upper Level Ballroom 6CDEF Excited to chat about the latest updates in models, algorithms, and systems for TTS! 🔊🤖✨ 🔗
0
2
11
@BeidiChen
Beidi Chen
5 days
The whole @InfiniAILab is at #NeurIPS this week! Our group is currently working on diverse directions of GenAI, .e.g., Scalable and Efficient RL, VideoGen, Modeling, Model Arch & Sys Co-Design (Many new releases coming!!). Come and talk to us @RJ_Sadhukhan @IronSteveZhou
0
11
108
@BrynnPeng
Yibo Peng ✈️ NeurIPS
2 months
🚨 Our new paper is out! What if your code agent fixes a bug, passes all tests, and still introduces a vulnerability? Even benign users can unknowingly trigger vulnerabilities in code agents. FCV-Attack shows that “functionally correct” doesn’t always mean “secure.”
@InfiniAILab
Infini-AI-Lab
2 months
🚀If your code agent generates a patch that passes all tests, should you trust it merge automatically? ⚠️You probably shouldn’t! “Correct” ≠ “Safe.” In our study we show that a single normal looking issue description, whether from a benign user or not, can lead code agents
0
1
5
@BeidiChen
Beidi Chen
2 months
📣 we study a threat model that users intent to leverage llm agent to fix problems in the code base but the agent could just insert vulnerabilities in while passes all the tests — I think security would be a more and more important problem when agents ability grows. So much fun
@InfiniAILab
Infini-AI-Lab
2 months
🚀If your code agent generates a patch that passes all tests, should you trust it merge automatically? ⚠️You probably shouldn’t! “Correct” ≠ “Safe.” In our study we show that a single normal looking issue description, whether from a benign user or not, can lead code agents
0
3
30
@InfiniAILab
Infini-AI-Lab
2 months
Joint work with @BrynnPeng , @shxjames , Lei Li, @Xinyu2ML , @christodorescu , Ravi Mangal, Corina Pasareanu, @haizhong_zheng , @BeidiChen
0
0
2
@InfiniAILab
Infini-AI-Lab
2 months
Digging deeper, we found the attack works by contaminating the model's internal state. Even if the agent's actions look correct, the malicious instruction from the initial prompt poisons the final generated patch. This means behavior-level defenses are not enough to stop this
1
0
2
@InfiniAILab
Infini-AI-Lab
2 months
Motivated by this, we designed FCV-Attack. Attacker and implicitly or explicitly induce LLM agents to generate FCV patches with a black-box and single-query setting. Here is the summary of our results: ✅ Successfully compromises 12/12 tested agent-model combos. ✅ Most
1
0
1
@InfiniAILab
Infini-AI-Lab
2 months
What does a 'functionally correct yet vulnerable' (FCV) patch look like? Imagine a patch that fixes a login bug (✅ functional correctness) but also adds a new logging line that writes the user's password to a public file (❌ security vulnerability). Those FCV patches even
1
0
1
@InfiniAILab
Infini-AI-Lab
2 months
🚀If your code agent generates a patch that passes all tests, should you trust it merge automatically? ⚠️You probably shouldn’t! “Correct” ≠ “Safe.” In our study we show that a single normal looking issue description, whether from a benign user or not, can lead code agents
2
10
23
@InfiniAILab
Infini-AI-Lab
2 months
Towards new era of Computer Architecture!
@BeidiChen
Beidi Chen
2 months
Congrats!!! So honored to be part of the team 🎉 Haha, first time making contribution in the computer architecture field — thanks for carrying me 🙏
1
0
9
@haizhong_zheng
Haizhong Zheng
2 months
🚀 Super excited to share our recent research about RL on stale data. 💪Meet M2PO: a powerful algorithm that turns stale rollouts into gold. Stable training, no performance drop, even with 256-update-stale data.
@InfiniAILab
Infini-AI-Lab
2 months
🤔Can we train RL on LLMs with extremely stale data? 🚀Our latest study says YES! Stale data can be as informative as on-policy data, unlocking more scalable, efficient asynchronous RL for LLMs. We introduce M2PO, an off-policy RL algorithm that keeps training stable and
1
2
18
@BeidiChen
Beidi Chen
2 months
📢🔥 New off-policy RL for LLMs — now training 32B model with 200+ stale steps for the first time, while still matching on-policy accuracy 💪 A big step toward scalable & decentralized agent training 😉
@InfiniAILab
Infini-AI-Lab
2 months
🤔Can we train RL on LLMs with extremely stale data? 🚀Our latest study says YES! Stale data can be as informative as on-policy data, unlocking more scalable, efficient asynchronous RL for LLMs. We introduce M2PO, an off-policy RL algorithm that keeps training stable and
4
19
212
@InfiniAILab
Infini-AI-Lab
2 months
0
0
3
@InfiniAILab
Infini-AI-Lab
2 months
Motivated by this, we propose M2PO (Second-Moment Trust Policy Optimization), a batch-level constraint and token-level masking training algorithm that stabilizes off-policy RL on stale data. ✅Uses M₂, a robust and variance-sensitive metric, to constrain distribution shift; ✅
1
0
6
@InfiniAILab
Infini-AI-Lab
2 months
Our further analysis reveals the dual nature of high-entropy tokens: while high-entropy tokens are crucial for learning progress, they also introduce instability in the off-policy setting. More high-entropy tokens utilized → Better performance, but less stable training. 🧵 3/4
1
0
3
@InfiniAILab
Infini-AI-Lab
2 months
In our study, we observe an interesting “Prosperity before Collapse” phenomenon: although training without a trust region eventually collapses, it achieves substantially better performance prior to collapse (even matches on-policy training). This indicates that the stale data
1
0
3
@InfiniAILab
Infini-AI-Lab
2 months
🤔Can we train RL on LLMs with extremely stale data? 🚀Our latest study says YES! Stale data can be as informative as on-policy data, unlocking more scalable, efficient asynchronous RL for LLMs. We introduce M2PO, an off-policy RL algorithm that keeps training stable and
4
41
229
@Real_HDong
Harry Dong
2 months
1/🧵 🎉Introducing Bridge🌉, our parallel LLM inference scaling method that shares info between all responses to an input prompt throughout the generation process! Bridge greatly improves the quality of individual responses and the entire response set! 📜 https://t.co/qL39PrzJL5
1
4
18
@InfiniAILab
Infini-AI-Lab
4 months
🤖 GPT-5 supports 128K output / 400K input tokens. 📜 Wiles’s Fermat proof took ~88K tokens — the final output only. 🧩 Add years of exploration, likely >880K tokens of reasoning. 🧠 Real intelligence isn’t about making it short — it’s about exploring the sparsity in the logic.
0
2
8
@InfiniAILab
Infini-AI-Lab
5 months
Huge thanks to @tinytitans_icml for an amazing workshop — see you next year! Honored to receive a Best Paper Award 🏆 Let’s unlock the potential of sparsity! Next up: scaling to hundreds/thousands of rollouts? Or making powerful R1/K2-level LLMs (not just 8B 4-bit models) run
1
9
45