Siddhant Bhambri @sbhambr1 X Profile

Siddhant Bhambri

@sbhambr1

Followers

89

Following

1K

Media

7

Statuses

75

Joined July 2019

Don't wanna be here? Send us removal request.

Siddhant Bhambri

@sbhambr1

4 days

RT @rao2z: Since DeepSeek R1, it has become fashionable to assume that intermediate tokens have interpretable semantics. We have argued aga….

0

7

0

Siddhant Bhambri

@sbhambr1

1 month

RT @TmlrPub: Do Think Tags Really Help LLMs Plan? A Critical Evaluation of ReAct-Style Prompting. Siddhant Bhambri, Mudit Verma, Subbarao….

openreview.net

The reasoning abilities of Large Language Models (LLMs) remain a topic of considerable interest and debate. Among the original papers arguing for emergent reasoning abilities of LLMs, ReAct became...

0

2

0

Grok

@grok

4 days

Join millions who have switched to Grok.

181

370

3K

Siddhant Bhambri

@sbhambr1

3 months

RT @rao2z: Anthropomorphization of intermediate tokens as reasoning/thinking traces isn't quite a harmless fad, and may be pushing LRM rese….

0

82

0

Siddhant Bhambri

@sbhambr1

3 months

RT @rao2z: Semantics of Intermediate Tokens in Trace-based distillation in Q&A tasks: Yochanites @sbhambr1 and @biswas_2707 looked at disti….

0

8

0

Siddhant Bhambri

@sbhambr1

4 months

RT @rao2z: Delighted to share that @sbhambr1 & @v_mudit's critical evaluation and refutation of the reasoning claims of ReACT has been acce….

0

2

0

Siddhant Bhambri

@sbhambr1

9 months

RT @rao2z: 📢 If you are #NeurIPS2024 OWA-2024 workshop (East Meeting Room 1-3), do check out two posters presented by Yochanites @karthikv7….

0

4

0

Siddhant Bhambri

@sbhambr1

9 months

RT @rao2z: 📢 Check out @sbhambr1 & @v_mudit's pitiless dissection of ReACT think tag claims at the Adaptive Foundation Models workshop toda….

0

5

0

Siddhant Bhambri

@sbhambr1

1 year

4/n Our experiments show how our framework can lead to a boost in sample efficiency for Reinforcement Learning! Joint work with @Amrita_Bh, @liuhuan and @rao2z, check out the paper for more details:

0

2

Siddhant Bhambri

@sbhambr1

1 year

3/n Hence, we augment LLMs with 𝐌𝐄𝐃𝐈𝐂, i.e., a Model-based feEDback critIC that performs step-by-step verification of LLM-generated actions and provides a feedback prompt.

1

0

2

Siddhant Bhambri

@sbhambr1

1 year

2/n We note that the performance of prompting off-the-shelf LLMs for decision-making tasks can be extremely brittle, even for popular prompting techniques such as ReAct! (see our other work that investigates these claims in detail: .

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)

@rao2z

1 year

📢 ReAct popularized the "Think 🤔" magic by claiming to help LLMs plan by "synergizing reasoning and acting." @v_mudit & @sbhambr1 investigated the claims, and have a thing are two to say about the extreme brittleness of ReAct style prompting. 👉1/

1

0

1

Siddhant Bhambri

@sbhambr1

1 year

📣Designing problem-specific reward shaping functions is hard, even for domain experts! In our recent work, we aim to answer: 𝐶𝑎𝑛 𝑤𝑒 use #𝐿𝐿𝑀𝑠 𝑡𝑜 𝑐𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡 𝑎 𝑟𝑒𝑤𝑎𝑟𝑑 𝑠ℎ𝑎𝑝𝑖𝑛𝑔 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 to 𝑏𝑜𝑜𝑠𝑡 #𝑅𝐿 𝑠𝑎𝑚𝑝𝑙𝑒 𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦?

1

3

10

Siddhant Bhambri

@sbhambr1

1 year

RT @rao2z: 📢 ReAct popularized the "Think 🤔" magic by claiming to help LLMs plan by "synergizing reasoning and acting." @v_mudit & @sbhamb….

0

23

0

Siddhant Bhambri

@sbhambr1

2 years

RT @v_mudit: 📢 #HRI '24 : Possible anthropomorphization & leniency towards failure cases have propelled discussions on emergent abilities o….

0

1

0

Siddhant Bhambri

@sbhambr1

2 years

9/ Our results highlight that PbRL algorithms offer effective performance exclusively in the case of Specified Orchestration, and we also highlight the challenges associated with the agent's policy learning through our ablation study. See the arXiv paper for more details!

0

1

Siddhant Bhambri

@sbhambr1

2 years

8/ As a first exploration of PbRL in Human-AI team setup, we adapt & extend existing SOTA single agent PbRL methods (RUNE, SURF, & PEBBLE) to study their performance across the tangents of Human Flexibility and Access to Human Policy.

1

0

1

Siddhant Bhambri

@sbhambr1

2 years

7/ Additionally, we propose a suite of domains comprising of variants of Highway domain and adapted MuJoCo locomotion domains, that require forced cooperation in a two-agent setting.

1

0

Siddhant Bhambri

@sbhambr1

2 years

6/ We call this as "Specified Orchestration", which requires maximal information regarding human behavior and arguably the simplest case for the AI agent, therefore, can be treated as a loose upper bound to the performance of Human-AI PbRL algorithms.

1

0

Siddhant Bhambri

@sbhambr1

2 years

5/ On the other hand, the problem can be simplified to a single-agent PbRL setting if the agent has complete knowledge of the human policy and that the human has a single policy in their set of feasible team strategies.

1

0

Siddhant Bhambri

@sbhambr1

2 years

4/ If the agent has zero access to the human's policy (as selected by the human agent to execute), it has an additional challenge of 'imagining' human actions when it queries the human for their preference on the team behavior.

1

0

2

Siddhant Bhambri

@sbhambr1

2 years

3/ Our work introduces the Human-AI PbRL Cooperation Game, & we discuss the concept of Human-Flexibility. An important challenge to consider here is the agent's access to human policy during training while receiving feedback from human partner on joint trajectories.

1

0

2