Sriram B
@b_shrir
Followers
294
Following
681
Media
18
Statuses
355
PhD student in Computer Science at UMD College Park | Ex - Research Fellow at MSR | IIT Bombay CS undergrad Make AI more understandable and reliable!
College Park, MD
Joined May 2018
A bit late, but this was accepted in EMNLP 2025 Findings. Unfortunately none of us are able to go present, but hope everyone attending has a nice time!
Do AI models really think the way they say they do? In our latest paper, we examine the faithfulness of the chain-of thought (CoT) produced by LLMs and LVLMs when exposed to a wide range of biases, with a special focus on visual biases and more subtler, implicit, biases.
0
0
5
Just tried this for the papers on which I am a reviewer. For many of them, my review is the only human-generated oneπ
ICLR authors, want to check if your reviews are likely AI generated? ICLR reviewers, want to check if your paper is likely AI generated? Here are AI detection results for every ICLR paper and review from @pangramlabs! It seems that ~21% of reviews may be AI?
0
0
3
I had quite a bit of fun training LLMs with the latest RL techniques this summer. Some of our results are in this thread:
π New research drop! We reimagine attribution not as retrieval, but as a reasoning problem. Introducing DECOMPTUNE π§© β a novel RL-driven training framework that teaches small models how to reason through decomposition-based reasoning π https://t.co/G7IA2GXe0v
#AI #Reasoning
0
0
0
π New research drop! We reimagine attribution not as retrieval, but as a reasoning problem. Introducing DECOMPTUNE π§© β a novel RL-driven training framework that teaches small models how to reason through decomposition-based reasoning π https://t.co/G7IA2GXe0v
#AI #Reasoning
2
2
5
Now proceeding to call myself a "final year PhD" https://t.co/zQGEAA6maF
Life hack: Just call yourself final year PhD! Apparently you can renew it every year. π #phdlife
0
1
5
Do AI models really think the way they say they do? In our latest paper, we examine the faithfulness of the chain-of thought (CoT) produced by LLMs and LVLMs when exposed to a wide range of biases, with a special focus on visual biases and more subtler, implicit, biases.
1
6
19
π RELAI is live β a platform for building reliable AI agents π We complete the learning loop for agents: simulate β evaluate β optimize - Simulate with LLM personas, mocked MCP servers/tools and grounded synthetic data - Evaluate with code + LLM evaluators; turn human
9
28
54
Today's pumpkins are the smallest they will ever be
0
0
2
This type of stuff used to impress me too, but remember that distinguishing between subtle details is a strength of AI and weakness of humans. Even by 2016 Imagenet models were able to distinguish between fine grained classes like Asian vs African elephant.
0
0
4
Introducing Maestro: the holistic optimizer for AI agents. Maestro optimizes the agent graph and tunes prompts/models/tools, fixing agent failure modes that prompt-only or RL weight tuning canβt touch. Maestro outperforms leading prompt optimizers (e.g., MIPROv2, GEPA) on
18
57
327
Very interesting. This would imply that CoTs for visual tasks which are less reliant on explicit reasoning are more likely to be unfaithful. We actually showed this in
arxiv.org
Chain-of-thought (CoT) reasoning enhances performance of large language models, but questions remain about whether these reasoning traces faithfully reflect the internal processes of the model. We...
Prior work has found that Chain of Thought (CoT) can be unfaithful. Should we then ignore what it says? In new research, we find that the CoT is informative about LLM cognition as long as the cognition is complex enough that it canβt be performed in a single forward pass.
0
0
0
This actually seems like a bigger deal than the Deepmind result a year ago. Non-math-specific approach (no formal verification etc) yielding an IMO gold is huge. Timelines shortened!
The model solves these problems without tools like lean or coding, it just uses natural language, and also only has 4.5 hours. We see the model reason at a very high level - trying out different strategies, making observations from examples, and testing hypothesis.
0
0
3
I'm still surprised by the 20% slowdown, I can accept no significant speedup but a slowdown seems very weird, I'm skeptical
0
0
0
I think the expectation mismatch has a pretty simple explanation - time passes more slowly when you are locked in and concentrating vs simply promoting and reviewing Cursor. This creates the illusion that you are actually getting stuff done quicker.
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
2
0
1
For more findings and details, please check out our paper: https://t.co/pws1H4Jly7 In collaboration with @BasuSamyadeep and @FeiziSoheil
arxiv.org
Chain-of-thought (CoT) reasoning enhances performance of large language models, but questions remain about whether these reasoning traces faithfully reflect the internal processes of the model. We...
0
0
0
We revisited CoT faithfulness in text-only LLMs and found similar patterns. Even advanced, SoTA reasoning models articulate explicit, content-based cues far more than subtle, formatting cues (like answer order).
1
0
0
We also discovered a new phenomenon we call "inconsistent reasoning." The model will correctly reason toward the right answer, then suddenly switch to a biased answer, sometimes without even mentioning the bias! A potential red flag for unfaithful CoT.
1
0
0
Surprisingly, the articulation rate does not change when biased samples are given in-context as compared to unbiased contexts or even no context. This means that explicit patterns in the context doesnβt help the model articulate biases better!
1
0
0