Siddhant Bhambri Profile
Siddhant Bhambri

@sbhambr1

Followers
89
Following
1K
Media
7
Statuses
75

Joined July 2019
Don't wanna be here? Send us removal request.
@sbhambr1
Siddhant Bhambri
4 days
RT @rao2z: Since DeepSeek R1, it has become fashionable to assume that intermediate tokens have interpretable semantics. We have argued aga….
0
7
0
@sbhambr1
Siddhant Bhambri
1 month
RT @TmlrPub: Do Think Tags Really Help LLMs Plan? A Critical Evaluation of ReAct-Style Prompting. Siddhant Bhambri, Mudit Verma, Subbarao….
openreview.net
The reasoning abilities of Large Language Models (LLMs) remain a topic of considerable interest and debate. Among the original papers arguing for emergent reasoning abilities of LLMs, ReAct became...
0
2
0
@grok
Grok
4 days
Join millions who have switched to Grok.
181
370
3K
@sbhambr1
Siddhant Bhambri
3 months
RT @rao2z: Anthropomorphization of intermediate tokens as reasoning/thinking traces isn't quite a harmless fad, and may be pushing LRM rese….
0
82
0
@sbhambr1
Siddhant Bhambri
3 months
RT @rao2z: Semantics of Intermediate Tokens in Trace-based distillation in Q&A tasks: Yochanites @sbhambr1 and @biswas_2707 looked at disti….
0
8
0
@sbhambr1
Siddhant Bhambri
4 months
RT @rao2z: Delighted to share that @sbhambr1 & @v_mudit's critical evaluation and refutation of the reasoning claims of ReACT has been acce….
0
2
0
@sbhambr1
Siddhant Bhambri
9 months
RT @rao2z: 📢 If you are #NeurIPS2024 OWA-2024 workshop (East Meeting Room 1-3), do check out two posters presented by Yochanites @karthikv7….
0
4
0
@sbhambr1
Siddhant Bhambri
9 months
RT @rao2z: 📢 Check out @sbhambr1 & @v_mudit's pitiless dissection of ReACT think tag claims at the Adaptive Foundation Models workshop toda….
0
5
0
@sbhambr1
Siddhant Bhambri
1 year
4/n Our experiments show how our framework can lead to a boost in sample efficiency for Reinforcement Learning! Joint work with @Amrita_Bh, @liuhuan and @rao2z, check out the paper for more details:
Tweet media one
0
0
2
@sbhambr1
Siddhant Bhambri
1 year
3/n Hence, we augment LLMs with 𝐌𝐄𝐃𝐈𝐂, i.e., a Model-based feEDback critIC that performs step-by-step verification of LLM-generated actions and provides a feedback prompt.
1
0
2
@sbhambr1
Siddhant Bhambri
1 year
2/n We note that the performance of prompting off-the-shelf LLMs for decision-making tasks can be extremely brittle, even for popular prompting techniques such as ReAct! (see our other work that investigates these claims in detail: .
@rao2z
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)
1 year
📢 ReAct popularized the "Think 🤔" magic by claiming to help LLMs plan by "synergizing reasoning and acting." @v_mudit & @sbhambr1 investigated the claims, and have a thing are two to say about the extreme brittleness of ReAct style prompting. 👉1/
Tweet media one
Tweet media two
Tweet media three
1
0
1
@sbhambr1
Siddhant Bhambri
1 year
📣Designing problem-specific reward shaping functions is hard, even for domain experts! In our recent work, we aim to answer: 𝐶𝑎𝑛 𝑤𝑒 use #𝐿𝐿𝑀𝑠 𝑡𝑜 𝑐𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡 𝑎 𝑟𝑒𝑤𝑎𝑟𝑑 𝑠ℎ𝑎𝑝𝑖𝑛𝑔 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 to 𝑏𝑜𝑜𝑠𝑡 #𝑅𝐿 𝑠𝑎𝑚𝑝𝑙𝑒 𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦?
Tweet media one
1
3
10
@sbhambr1
Siddhant Bhambri
1 year
RT @rao2z: 📢 ReAct popularized the "Think 🤔" magic by claiming to help LLMs plan by "synergizing reasoning and acting." @v_mudit & @sbhamb….
0
23
0
@sbhambr1
Siddhant Bhambri
2 years
RT @v_mudit: 📢 #HRI '24 : Possible anthropomorphization & leniency towards failure cases have propelled discussions on emergent abilities o….
0
1
0
@sbhambr1
Siddhant Bhambri
2 years
9/ Our results highlight that PbRL algorithms offer effective performance exclusively in the case of Specified Orchestration, and we also highlight the challenges associated with the agent's policy learning through our ablation study. See the arXiv paper for more details!
Tweet media one
0
0
1
@sbhambr1
Siddhant Bhambri
2 years
8/ As a first exploration of PbRL in Human-AI team setup, we adapt & extend existing SOTA single agent PbRL methods (RUNE, SURF, & PEBBLE) to study their performance across the tangents of Human Flexibility and Access to Human Policy.
1
0
1
@sbhambr1
Siddhant Bhambri
2 years
7/ Additionally, we propose a suite of domains comprising of variants of Highway domain and adapted MuJoCo locomotion domains, that require forced cooperation in a two-agent setting.
1
0
0
@sbhambr1
Siddhant Bhambri
2 years
6/ We call this as "Specified Orchestration", which requires maximal information regarding human behavior and arguably the simplest case for the AI agent, therefore, can be treated as a loose upper bound to the performance of Human-AI PbRL algorithms.
1
0
0
@sbhambr1
Siddhant Bhambri
2 years
5/ On the other hand, the problem can be simplified to a single-agent PbRL setting if the agent has complete knowledge of the human policy and that the human has a single policy in their set of feasible team strategies.
1
0
0
@sbhambr1
Siddhant Bhambri
2 years
4/ If the agent has zero access to the human's policy (as selected by the human agent to execute), it has an additional challenge of 'imagining' human actions when it queries the human for their preference on the team behavior.
1
0
2
@sbhambr1
Siddhant Bhambri
2 years
3/ Our work introduces the Human-AI PbRL Cooperation Game, & we discuss the concept of Human-Flexibility. An important challenge to consider here is the agent's access to human policy during training while receiving feedback from human partner on joint trajectories.
1
0
2