Infini-AI-Lab
@InfiniAILab
Followers
1K
Following
55
Media
46
Statuses
88
Pittsburgh, PA
Joined September 2024
See you at the tutorial! 🎉 Scale Test-Time Compute on Modern Hardware ⚙️💻 with @BeidiChen @Azaliamirh 1:30 - 4pm, Upper Level Ballroom 6CDEF Excited to chat about the latest updates in models, algorithms, and systems for TTS! 🔊🤖✨ 🔗
0
2
11
The whole @InfiniAILab is at #NeurIPS this week! Our group is currently working on diverse directions of GenAI, .e.g., Scalable and Efficient RL, VideoGen, Modeling, Model Arch & Sys Co-Design (Many new releases coming!!). Come and talk to us @RJ_Sadhukhan @IronSteveZhou
0
11
108
🚨 Our new paper is out! What if your code agent fixes a bug, passes all tests, and still introduces a vulnerability? Even benign users can unknowingly trigger vulnerabilities in code agents. FCV-Attack shows that “functionally correct” doesn’t always mean “secure.”
🚀If your code agent generates a patch that passes all tests, should you trust it merge automatically? ⚠️You probably shouldn’t! “Correct” ≠ “Safe.” In our study we show that a single normal looking issue description, whether from a benign user or not, can lead code agents
0
1
5
📣 we study a threat model that users intent to leverage llm agent to fix problems in the code base but the agent could just insert vulnerabilities in while passes all the tests — I think security would be a more and more important problem when agents ability grows. So much fun
🚀If your code agent generates a patch that passes all tests, should you trust it merge automatically? ⚠️You probably shouldn’t! “Correct” ≠ “Safe.” In our study we show that a single normal looking issue description, whether from a benign user or not, can lead code agents
0
3
30
Joint work with @BrynnPeng , @shxjames , Lei Li, @Xinyu2ML , @christodorescu , Ravi Mangal, Corina Pasareanu, @haizhong_zheng , @BeidiChen
0
0
2
Digging deeper, we found the attack works by contaminating the model's internal state. Even if the agent's actions look correct, the malicious instruction from the initial prompt poisons the final generated patch. This means behavior-level defenses are not enough to stop this
1
0
2
Motivated by this, we designed FCV-Attack. Attacker and implicitly or explicitly induce LLM agents to generate FCV patches with a black-box and single-query setting. Here is the summary of our results: ✅ Successfully compromises 12/12 tested agent-model combos. ✅ Most
1
0
1
What does a 'functionally correct yet vulnerable' (FCV) patch look like? Imagine a patch that fixes a login bug (✅ functional correctness) but also adds a new logging line that writes the user's password to a public file (❌ security vulnerability). Those FCV patches even
1
0
1
🚀If your code agent generates a patch that passes all tests, should you trust it merge automatically? ⚠️You probably shouldn’t! “Correct” ≠ “Safe.” In our study we show that a single normal looking issue description, whether from a benign user or not, can lead code agents
2
10
23
🚀 Super excited to share our recent research about RL on stale data. 💪Meet M2PO: a powerful algorithm that turns stale rollouts into gold. Stable training, no performance drop, even with 256-update-stale data.
🤔Can we train RL on LLMs with extremely stale data? 🚀Our latest study says YES! Stale data can be as informative as on-policy data, unlocking more scalable, efficient asynchronous RL for LLMs. We introduce M2PO, an off-policy RL algorithm that keeps training stable and
1
2
18
📢🔥 New off-policy RL for LLMs — now training 32B model with 200+ stale steps for the first time, while still matching on-policy accuracy 💪 A big step toward scalable & decentralized agent training 😉
🤔Can we train RL on LLMs with extremely stale data? 🚀Our latest study says YES! Stale data can be as informative as on-policy data, unlocking more scalable, efficient asynchronous RL for LLMs. We introduce M2PO, an off-policy RL algorithm that keeps training stable and
4
19
212
Motivated by this, we propose M2PO (Second-Moment Trust Policy Optimization), a batch-level constraint and token-level masking training algorithm that stabilizes off-policy RL on stale data. ✅Uses M₂, a robust and variance-sensitive metric, to constrain distribution shift; ✅
1
0
6
Our further analysis reveals the dual nature of high-entropy tokens: while high-entropy tokens are crucial for learning progress, they also introduce instability in the off-policy setting. More high-entropy tokens utilized → Better performance, but less stable training. 🧵 3/4
1
0
3
In our study, we observe an interesting “Prosperity before Collapse” phenomenon: although training without a trust region eventually collapses, it achieves substantially better performance prior to collapse (even matches on-policy training). This indicates that the stale data
1
0
3
🤔Can we train RL on LLMs with extremely stale data? 🚀Our latest study says YES! Stale data can be as informative as on-policy data, unlocking more scalable, efficient asynchronous RL for LLMs. We introduce M2PO, an off-policy RL algorithm that keeps training stable and
4
41
229
1/🧵 🎉Introducing Bridge🌉, our parallel LLM inference scaling method that shares info between all responses to an input prompt throughout the generation process! Bridge greatly improves the quality of individual responses and the entire response set! 📜 https://t.co/qL39PrzJL5
1
4
18
🤖 GPT-5 supports 128K output / 400K input tokens. 📜 Wiles’s Fermat proof took ~88K tokens — the final output only. 🧩 Add years of exploration, likely >880K tokens of reasoning. 🧠 Real intelligence isn’t about making it short — it’s about exploring the sparsity in the logic.
0
2
8
Huge thanks to @tinytitans_icml for an amazing workshop — see you next year! Honored to receive a Best Paper Award 🏆 Let’s unlock the potential of sparsity! Next up: scaling to hundreds/thousands of rollouts? Or making powerful R1/K2-level LLMs (not just 8B 4-bit models) run
1
9
45