Siddhant (Sid) Bhambri
@sbhambr1
Followers
107
Following
1K
Media
9
Statuses
86
PhD @ Yochan Lab, ASU
Joined July 2019
ICYMI, here is Dr! @sbhambr1's PhD defense video from this morning.. 👉 https://t.co/r9HyZPELnV
0
6
8
2
4
47
💡 Are AI agents trained to solve tasks with humans in a team actually cooperating? 🔗Check out our recent work accepted at #AAAI2026 that dives deeper into this question:
lnkd.in
This link will take you to a page that’s not on LinkedIn
What if your cooperative AI agent is actively avoiding you? Despite significant interest in having human and AI agents teaming constructively to solve problems, most work in the area focuses on the bottom line task reward rather than any actual cooperation between the agents.
0
0
0
💡 𝐈𝐬 𝐬𝐞𝐦𝐚𝐧𝐭𝐢𝐜 𝐜𝐨𝐫𝐫𝐞𝐜𝐭𝐧𝐞𝐬𝐬 𝐨𝐟 𝐂𝐡𝐚𝐢𝐧 𝐨𝐟 𝐓𝐡𝐨𝐮𝐠𝐡𝐭 𝐭𝐫𝐚𝐜𝐞𝐬 𝐭𝐡𝐞 𝐬𝐚𝐦𝐞 𝐚𝐬 𝐥𝐨𝐜𝐚𝐥 𝐜𝐨𝐡𝐞𝐫𝐞𝐧𝐜𝐞? 🖇️ Check out our recent work critically looking at how trace coherence is impacted by RLVR post-training:
lnkd.in
This link will take you to a page that’s not on LinkedIn
Our recent research efforts have questioned the narrative that the LRM intermediate tokens have semantics (see https://t.co/NfPXWdZvnr ). Some may counter these with "..but I read the traces, and they do seem to make sense.." and claim RLVR post-training must be making the
0
1
1
➡️ 𝘒𝘦𝘺 𝘱𝘢𝘵𝘩𝘸𝘢𝘺𝘴 𝘧𝘰𝘳 𝘥𝘦𝘴𝘪𝘨𝘯𝘪𝘯𝘨 𝘳𝘰𝘣𝘶𝘴𝘵 𝘢𝘯𝘥 𝘳𝘦𝘭𝘪𝘢𝘣𝘭𝘦, 𝘦𝘯𝘥 𝘶𝘴𝘦𝘳-𝘧𝘢𝘤𝘪𝘯𝘨 𝘈𝘐 𝘵𝘩𝘢𝘵 𝘣𝘢𝘭𝘢𝘯𝘤𝘦𝘴 𝘢𝘥𝘷𝘪𝘴𝘢𝘣𝘪𝘭𝘪𝘵𝘺 𝘢𝘯𝘥 𝘦𝘹𝘱𝘭𝘢𝘪𝘯𝘢𝘣𝘪𝘭𝘪𝘵𝘺. #AI #MachineLearning #LLMs #HumanAI
0
0
0
➡️ 𝘞𝘩𝘢𝘵 𝘪𝘯𝘵𝘦𝘳𝘱𝘳𝘦𝘵𝘢𝘣𝘪𝘭𝘪𝘵𝘺 𝘢𝘯𝘥 𝘳𝘦𝘢𝘴𝘰𝘯𝘪𝘯𝘨 𝘵𝘳𝘢𝘤𝘦𝘴 𝘳𝘦𝘢𝘭𝘭𝘺 𝘮𝘦𝘢𝘯 𝘧𝘰𝘳 𝘦𝘯𝘥 𝘶𝘴𝘦𝘳𝘴 𝘴𝘦𝘦𝘬𝘪𝘯𝘨 𝘵𝘰 𝘵𝘳𝘶𝘴𝘵 𝘢𝘯𝘥 𝘶𝘯𝘥𝘦𝘳𝘴𝘵𝘢𝘯𝘥 𝘈𝘐 𝘴𝘺𝘴𝘵𝘦𝘮𝘴. ( https://t.co/PI5fOWqAzn) ( https://t.co/Iz5atzx1vb) (5/n)
Semantics of Intermediate Tokens in Trace-based distillation in Q&A tasks: Yochanites @sbhambr1 and @biswas_2707 looked at distillation on a Q&A task, and found a disconnect between the validity of derivational traces and the correctness of the solution.. 🧵 1/
1
0
0
Specifically, I discuss: ➡️ 𝘏𝘰𝘸 𝘓𝘓𝘔𝘴 𝘤𝘢𝘯 𝘢𝘤𝘵 𝘢𝘴 𝘱𝘭𝘢𝘯𝘯𝘦𝘳𝘴 𝘢𝘯𝘥 𝘵𝘩𝘦 𝘱𝘳𝘢𝘤𝘵𝘪𝘤𝘢𝘭 𝘭𝘪𝘮𝘪𝘵𝘴 𝘰𝘧 𝘵𝘩𝘦𝘪𝘳 𝘢𝘣𝘪𝘭𝘪𝘵𝘺 𝘵𝘰 𝘵𝘢𝘬𝘦 𝘱𝘭𝘢𝘯 𝘨𝘶𝘪𝘥𝘢𝘯𝘤𝘦 𝘧𝘳𝘰𝘮 𝘩𝘶𝘮𝘢𝘯𝘴. ( https://t.co/b61b4l49go) ( https://t.co/1lmB19hica) (4/n)
📢 ReAct popularized the "Think 🤔" magic by claiming to help LLMs plan by "synergizing reasoning and acting." @v_mudit & @sbhambr1 investigated the claims, and have a thing are two to say about the extreme brittleness of ReAct style prompting. 👉 https://t.co/kvAKWRvzEZ 1/
1
0
0
💡 𝐓𝐋𝐃𝐑; In this talk, I examine the challenges and opportunities for making #AI agents more #advisable and #explainable, particularly in sequential decision-making tasks involving human interaction. (2/n)
1
0
0
Recent talk at @allen_ai: "𝐑𝐨𝐥𝐞 𝐨𝐟 𝐋𝐚𝐫𝐠𝐞 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥𝐬 𝐢𝐧 𝐇𝐮𝐦𝐚𝐧-𝐀𝐈 𝐈𝐧𝐭𝐞𝐫𝐚𝐜𝐭𝐢𝐨𝐧: 𝐀 𝐂𝐫𝐢𝐭𝐢𝐜𝐚𝐥 𝐀𝐩𝐩𝐫𝐚𝐢𝐬𝐚𝐥". Link: https://t.co/vV0sDcsvtE Thanks to @rao2z for guiding this research & to @dsweld for hosting me!🧵(1/n)
1
2
8
Since DeepSeek R1, it has become fashionable to assume that intermediate tokens have interpretable semantics. We have argued against this before. Here @sbhambr1 & @biswas_2707 ask: Is cognitive interpretability of intermediate tokens an albatross on task accuracy? 1/
3
7
64
Do Think Tags Really Help LLMs Plan? A Critical Evaluation of ReAct-Style Prompting Siddhant Bhambri, Mudit Verma, Subbarao Kambhampati. Action editor: Li Li. https://t.co/o3NdQ5HEYP
#reasoning #prompting #prompts
openreview.net
The reasoning abilities of Large Language Models (LLMs) remain a topic of considerable interest and debate. Among the original papers arguing for emergent reasoning abilities of LLMs, ReAct became...
0
2
6
Anthropomorphization of intermediate tokens as reasoning/thinking traces isn't quite a harmless fad, and may be pushing LRM research into questionable directions.. So we decided to put together a more complete argument.. 👇🧵 1/
9
86
480
Semantics of Intermediate Tokens in Trace-based distillation in Q&A tasks: Yochanites @sbhambr1 and @biswas_2707 looked at distillation on a Q&A task, and found a disconnect between the validity of derivational traces and the correctness of the solution.. 🧵 1/
2
8
27
Delighted to share that @sbhambr1 & @v_mudit's critical evaluation and refutation of the reasoning claims of ReACT has been accepted to TMLR @TmlrOrg 👉 https://t.co/xxZLJdWsBm
📢 ReAct popularized the "Think 🤔" magic by claiming to help LLMs plan by "synergizing reasoning and acting." @v_mudit & @sbhambr1 investigated the claims, and have a thing are two to say about the extreme brittleness of ReAct style prompting. 👉 https://t.co/kvAKWRvzEZ 1/
1
3
16
📢 If you are #NeurIPS2024 OWA-2024 workshop (East Meeting Room 1-3), do check out two posters presented by Yochanites @karthikv792, @kayastechly & @sbhambr1 👉 LLMs can't reason; can LRMs? (Evaluating and improving 🍓 o1 on planning & scheduling ) 👉 LLMs to reward shape RL
0
4
8
📢 Check out @sbhambr1 & @v_mudit's pitiless dissection of ReACT think tag claims at the Adaptive Foundation Models workshop today @ #NeurIPS2024. (West Exhibition Hall A; 4:30pm) https://t.co/hR0vQrZGI9 This makes for a great companion to our main track analysis of CoT Chain of
📢 ReAct popularized the "Think 🤔" magic by claiming to help LLMs plan by "synergizing reasoning and acting." @v_mudit & @sbhambr1 investigated the claims, and have a thing are two to say about the extreme brittleness of ReAct style prompting. 👉 https://t.co/kvAKWRvzEZ 1/
3
5
38
4/n Our experiments show how our framework can lead to a boost in sample efficiency for Reinforcement Learning! Joint work with @Amrita_Bh, @liuhuan and @rao2z, check out the paper for more details: https://t.co/pYMns3wkMD
0
0
2
3/n Hence, we augment LLMs with 𝐌𝐄𝐃𝐈𝐂, i.e., a Model-based feEDback critIC that performs step-by-step verification of LLM-generated actions and provides a feedback prompt.
1
0
2