Sriram B @b_shrir X Profile

Sriram B

@b_shrir

Followers

294

Following

681

Media

18

Statuses

355

PhD student in Computer Science at UMD College Park | Ex - Research Fellow at MSR | IIT Bombay CS undergrad Make AI more understandable and reliable!

https://t.co/Mtps7rR2Ys

College Park, MD

Joined May 2018

Don't wanna be here? Send us removal request.

Sriram B

@b_shrir

26 days

A bit late, but this was accepted in EMNLP 2025 Findings. Unfortunately none of us are able to go present, but hope everyone attending has a nice time!

Sriram B

@b_shrir

6 months

Do AI models really think the way they say they do? In our latest paper, we examine the faithfulness of the chain-of thought (CoT) produced by LLMs and LVLMs when exposed to a wide range of biases, with a special focus on visual biases and more subtler, implicit, biases.

0

5

Sriram B

@b_shrir

11 days

Just tried this for the papers on which I am a reviewer. For many of them, my review is the only human-generated one😂

Graham Neubig

@gneubig

12 days

ICLR authors, want to check if your reviews are likely AI generated? ICLR reviewers, want to check if your paper is likely AI generated? Here are AI detection results for every ICLR paper and review from @pangramlabs! It seems that ~21% of reviews may be AI?

0

3

Sriram B

@b_shrir

20 days

I had quite a bit of fun training LLMs with the latest RL techniques this summer. Some of our results are in this thread:

Koustava Goswami

@koustavagoswami

21 days

🚀 New research drop! We reimagine attribution not as retrieval, but as a reasoning problem. Introducing DECOMPTUNE 🧩 → a novel RL-driven training framework that teaches small models how to reason through decomposition-based reasoning 📄 https://t.co/G7IA2GXe0v #AI #Reasoning

0

Koustava Goswami

@koustavagoswami

21 days

🚀 New research drop! We reimagine attribution not as retrieval, but as a reasoning problem. Introducing DECOMPTUNE 🧩 → a novel RL-driven training framework that teaches small models how to reason through decomposition-based reasoning 📄 https://t.co/G7IA2GXe0v #AI #Reasoning

2

5

Sriram B

@b_shrir

20 days

Now proceeding to call myself a "final year PhD" https://t.co/zQGEAA6maF

Gowthami ✈️NeurIPS’25

@gowthami_s

22 days

Life hack: Just call yourself final year PhD! Apparently you can renew it every year. 😜 #phdlife

0

1

5

Sriram B

@b_shrir

20 days

A bit of a personal update: I am now a PhD Candidate!

35

25

956

Sriram B

@b_shrir

6 months

Do AI models really think the way they say they do? In our latest paper, we examine the faithfulness of the chain-of thought (CoT) produced by LLMs and LVLMs when exposed to a wide range of biases, with a special focus on visual biases and more subtler, implicit, biases.

1

6

19

RELAI

@ReliableAI

1 month

🚀 RELAI is live — a platform for building reliable AI agents 🔁 We complete the learning loop for agents: simulate → evaluate → optimize - Simulate with LLM personas, mocked MCP servers/tools and grounded synthetic data - Evaluate with code + LLM evaluators; turn human

9

28

54

Sriram B

@b_shrir

1 month

Today's pumpkins are the smallest they will ever be

erika

@yeeeerika

1 month

this headline hums with ancient autumnal dread

0

2

Sriram B

@b_shrir

2 months

It seems that the first task of formal theorem proving AI will be to thoroughly debug Lean.

doomslide

@doomslide

2 months

I hate to disappoint all the wet behind the ears formalization-cels but we are very likely bound for many more *silly* lean exploits. The interesting bugs will have to be earned with sweat and tears.

0

Sriram B

@b_shrir

2 months

This type of stuff used to impress me too, but remember that distinguishing between subtle details is a strength of AI and weakness of humans. Even by 2016 Imagenet models were able to distinguish between fine grained classes like Asian vs African elephant.

abhishek

@abhi1thakur

2 months

I no longer have doubts

0

4

Soheil Feizi

@FeiziSoheil

3 months

Introducing Maestro: the holistic optimizer for AI agents. Maestro optimizes the agent graph and tunes prompts/models/tools, fixing agent failure modes that prompt-only or RL weight tuning can’t touch. Maestro outperforms leading prompt optimizers (e.g., MIPROv2, GEPA) on

18

57

327

Sriram B

@b_shrir

4 months

Very interesting. This would imply that CoTs for visual tasks which are less reliant on explicit reasoning are more likely to be unfaithful. We actually showed this in

arxiv.org

Chain-of-thought (CoT) reasoning enhances performance of large language models, but questions remain about whether these reasoning traces faithfully reflect the internal processes of the model. We...

METR

@METR_Evals

4 months

Prior work has found that Chain of Thought (CoT) can be unfaithful. Should we then ignore what it says? In new research, we find that the CoT is informative about LLM cognition as long as the cognition is complex enough that it can’t be performed in a single forward pass.

0

Sriram B

@b_shrir

4 months

This actually seems like a bigger deal than the Deepmind result a year ago. Non-math-specific approach (no formal verification etc) yielding an IMO gold is huge. Timelines shortened!

Sheryl Hsu

@SherylHsu02

4 months

The model solves these problems without tools like lean or coding, it just uses natural language, and also only has 4.5 hours. We see the model reason at a very high level - trying out different strategies, making observations from examples, and testing hypothesis.

0

3

Sriram B

@b_shrir

5 months

I'm still surprised by the 20% slowdown, I can accept no significant speedup but a slowdown seems very weird, I'm skeptical

0

Sriram B

@b_shrir

5 months

I think the expectation mismatch has a pretty simple explanation - time passes more slowly when you are locked in and concentrating vs simply promoting and reviewing Cursor. This creates the illusion that you are actually getting stuff done quicker.

METR

@METR_Evals

5 months

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.

2

0

1

Sriram B

@b_shrir

6 months

For more findings and details, please check out our paper: https://t.co/pws1H4Jly7 In collaboration with @BasuSamyadeep and @FeiziSoheil

arxiv.org

Chain-of-thought (CoT) reasoning enhances performance of large language models, but questions remain about whether these reasoning traces faithfully reflect the internal processes of the model. We...

0

Sriram B

@b_shrir

6 months

We revisited CoT faithfulness in text-only LLMs and found similar patterns. Even advanced, SoTA reasoning models articulate explicit, content-based cues far more than subtle, formatting cues (like answer order).

1

0

Sriram B

@b_shrir

6 months

We also discovered a new phenomenon we call "inconsistent reasoning." The model will correctly reason toward the right answer, then suddenly switch to a biased answer, sometimes without even mentioning the bias! A potential red flag for unfaithful CoT.

1

0

Sriram B

@b_shrir

6 months

Surprisingly, the articulation rate does not change when biased samples are given in-context as compared to unbiased contexts or even no context. This means that explicit patterns in the context doesn’t help the model articulate biases better!

1

0