Sriram B Profile
Sriram B

@b_shrir

Followers
294
Following
681
Media
18
Statuses
355

PhD student in Computer Science at UMD College Park | Ex - Research Fellow at MSR | IIT Bombay CS undergrad Make AI more understandable and reliable!

College Park, MD
Joined May 2018
Don't wanna be here? Send us removal request.
@b_shrir
Sriram B
26 days
A bit late, but this was accepted in EMNLP 2025 Findings. Unfortunately none of us are able to go present, but hope everyone attending has a nice time!
@b_shrir
Sriram B
6 months
Do AI models really think the way they say they do? In our latest paper, we examine the faithfulness of the chain-of thought (CoT) produced by LLMs and LVLMs when exposed to a wide range of biases, with a special focus on visual biases and more subtler, implicit, biases.
0
0
5
@b_shrir
Sriram B
11 days
Just tried this for the papers on which I am a reviewer. For many of them, my review is the only human-generated oneπŸ˜‚
@gneubig
Graham Neubig
12 days
ICLR authors, want to check if your reviews are likely AI generated? ICLR reviewers, want to check if your paper is likely AI generated? Here are AI detection results for every ICLR paper and review from @pangramlabs! It seems that ~21% of reviews may be AI?
0
0
3
@b_shrir
Sriram B
20 days
I had quite a bit of fun training LLMs with the latest RL techniques this summer. Some of our results are in this thread:
@koustavagoswami
Koustava Goswami
21 days
πŸš€ New research drop! We reimagine attribution not as retrieval, but as a reasoning problem. Introducing DECOMPTUNE 🧩 β†’ a novel RL-driven training framework that teaches small models how to reason through decomposition-based reasoning πŸ“„ https://t.co/G7IA2GXe0v #AI #Reasoning
0
0
0
@koustavagoswami
Koustava Goswami
21 days
πŸš€ New research drop! We reimagine attribution not as retrieval, but as a reasoning problem. Introducing DECOMPTUNE 🧩 β†’ a novel RL-driven training framework that teaches small models how to reason through decomposition-based reasoning πŸ“„ https://t.co/G7IA2GXe0v #AI #Reasoning
2
2
5
@b_shrir
Sriram B
20 days
Now proceeding to call myself a "final year PhD" https://t.co/zQGEAA6maF
@gowthami_s
Gowthami ✈️NeurIPS’25
22 days
Life hack: Just call yourself final year PhD! Apparently you can renew it every year. 😜 #phdlife
0
1
5
@b_shrir
Sriram B
20 days
A bit of a personal update: I am now a PhD Candidate!
35
25
956
@b_shrir
Sriram B
6 months
Do AI models really think the way they say they do? In our latest paper, we examine the faithfulness of the chain-of thought (CoT) produced by LLMs and LVLMs when exposed to a wide range of biases, with a special focus on visual biases and more subtler, implicit, biases.
1
6
19
@ReliableAI
RELAI
1 month
πŸš€ RELAI is live β€” a platform for building reliable AI agents πŸ” We complete the learning loop for agents: simulate β†’ evaluate β†’ optimize - Simulate with LLM personas, mocked MCP servers/tools and grounded synthetic data - Evaluate with code + LLM evaluators; turn human
9
28
54
@b_shrir
Sriram B
1 month
Today's pumpkins are the smallest they will ever be
@yeeeerika
erika
1 month
this headline hums with ancient autumnal dread
0
0
2
@b_shrir
Sriram B
2 months
It seems that the first task of formal theorem proving AI will be to thoroughly debug Lean.
@doomslide
doomslide
2 months
I hate to disappoint all the wet behind the ears formalization-cels but we are very likely bound for many more *silly* lean exploits. The interesting bugs will have to be earned with sweat and tears.
0
0
0
@b_shrir
Sriram B
2 months
This type of stuff used to impress me too, but remember that distinguishing between subtle details is a strength of AI and weakness of humans. Even by 2016 Imagenet models were able to distinguish between fine grained classes like Asian vs African elephant.
@abhi1thakur
abhishek
2 months
I no longer have doubts
0
0
4
@FeiziSoheil
Soheil Feizi
3 months
Introducing Maestro: the holistic optimizer for AI agents. Maestro optimizes the agent graph and tunes prompts/models/tools, fixing agent failure modes that prompt-only or RL weight tuning can’t touch. Maestro outperforms leading prompt optimizers (e.g., MIPROv2, GEPA) on
18
57
327
@b_shrir
Sriram B
4 months
Very interesting. This would imply that CoTs for visual tasks which are less reliant on explicit reasoning are more likely to be unfaithful. We actually showed this in
Tweet card summary image
arxiv.org
Chain-of-thought (CoT) reasoning enhances performance of large language models, but questions remain about whether these reasoning traces faithfully reflect the internal processes of the model. We...
@METR_Evals
METR
4 months
Prior work has found that Chain of Thought (CoT) can be unfaithful. Should we then ignore what it says? In new research, we find that the CoT is informative about LLM cognition as long as the cognition is complex enough that it can’t be performed in a single forward pass.
0
0
0
@b_shrir
Sriram B
4 months
This actually seems like a bigger deal than the Deepmind result a year ago. Non-math-specific approach (no formal verification etc) yielding an IMO gold is huge. Timelines shortened!
@SherylHsu02
Sheryl Hsu
4 months
The model solves these problems without tools like lean or coding, it just uses natural language, and also only has 4.5 hours. We see the model reason at a very high level - trying out different strategies, making observations from examples, and testing hypothesis.
0
0
3
@b_shrir
Sriram B
5 months
I'm still surprised by the 20% slowdown, I can accept no significant speedup but a slowdown seems very weird, I'm skeptical
0
0
0
@b_shrir
Sriram B
5 months
I think the expectation mismatch has a pretty simple explanation - time passes more slowly when you are locked in and concentrating vs simply promoting and reviewing Cursor. This creates the illusion that you are actually getting stuff done quicker.
@METR_Evals
METR
5 months
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
2
0
1
@b_shrir
Sriram B
6 months
We revisited CoT faithfulness in text-only LLMs and found similar patterns. Even advanced, SoTA reasoning models articulate explicit, content-based cues far more than subtle, formatting cues (like answer order).
1
0
0
@b_shrir
Sriram B
6 months
We also discovered a new phenomenon we call "inconsistent reasoning." The model will correctly reason toward the right answer, then suddenly switch to a biased answer, sometimes without even mentioning the bias! A potential red flag for unfaithful CoT.
1
0
0
@b_shrir
Sriram B
6 months
Surprisingly, the articulation rate does not change when biased samples are given in-context as compared to unbiased contexts or even no context. This means that explicit patterns in the context doesn’t help the model articulate biases better!
1
0
0