reshmigh Profile Banner
Reshmi Ghosh Profile
Reshmi Ghosh

@reshmigh

Followers
1K
Following
38K
Media
101
Statuses
2K

Sr. Scientist working on Agents,Reasoning, AI Security, @Microsoft AI, Chair @WiMLDS| Ph.D. @CarnegieMellon | making machines trustworthy| Views my own; She/Her

United States
Joined July 2013
Don't wanna be here? Send us removal request.
@reshmigh
Reshmi Ghosh
4 months
🚨New paper! With @UMassAmherst , @UofMaryland: "Hop, Skip, and Overthink: Diagnosing Why Reasoning Models Fumble during Multi-Hop Analysis"🤯. Why do #reasoningmodels break down when chaining multiple steps? We studied #CoT traces to find out. 🧵(1/n) 🔗 https://t.co/upzlb39m3n
2
4
13
@huashen218
Hua Shen✨@NeurIPS & ASRU🌴
29 days
🧐Are values in LLMs aligned with humans? 1️⃣ And if they are — do LLMs stay honest to those values, or sometimes say one thing but act another? 2️⃣ ✨ We explore these questions in two papers presented at #EMNLP2025: 1️⃣ ValueCompass: https://t.co/M4DF2LGg41 (WiNLP Workshop)
1
14
95
@reshmigh
Reshmi Ghosh
1 month
So Agents are flat earthers? :D
@canvardar
Can Vardar
1 month
finally, linkedin is funny
0
0
2
@timalthoff
Tim Althoff
1 month
(please reshare) I'm recruiting multiple PhD students and Postdocs @uwcse @uwnlp ( https://t.co/I5wQsFnCLL). Focus areas incl. psychosocial AI simulation and safety, Human-AI collaboration. PhD: https://t.co/ku40wCrpYh Postdocs: https://t.co/K9HUIPJ5h6
7
111
402
@myra_deng
Myra Deng@NeurIPS
1 month
Using probes to accurately and efficiently detect model behavior (in this case PII leakage) in prod is one of the clear wins for applied interpretability. This is the path to semantic determinism - imagine AI models instrumented with internal probes that recognize when they’re
@GoodfireAI
Goodfire
1 month
Why use LLM-as-a-judge when you can get the same performance for 15–500x cheaper? Our new research with @RakutenGroup on PII detection finds that SAE probes: - transfer from synthetic to real data better than normal probes - match GPT-5 Mini performance at 1/15 the cost (1/6)
5
15
260
@lilyxu0
Lily Xu
1 month
Launching AI for Public Goods Fast Grants! We'll distribute $150k to advance critical work connecting AI and public goods. 💰 $10k per project 💰 $800 reviewer compensation PUBLIC GOODS := open source, ecosystem services, climate, urban infra, comms, education, science, & more
@dwddao
David Dao
2 months
Announcing AI for Public Goods Fast Grants (AI4PG) - Up to $10K for AI research improving public goods funding. Fast review (2-3 weeks), simple applications (4 pages + 1 budget page), open to any researchers worldwide. Call for reviewers now open! https://t.co/zUTezH1Afc
5
34
155
@reshmigh
Reshmi Ghosh
1 month
Evaluations.....
@kchonyc
Kyunghyun Cho
1 month
wow
0
0
1
@niloofar_mire
Niloofar
1 month
I'm recruiting students for fall 2026 thru @LTIatCMU & @CMU_EPP, in: 1. Privacy & security of LLMs, coding, long horizon & embodied agents (robotics) 2. Tiny local llms 3. AI for scientific reasoning, esp. chemistry 4. Latent reasoning 5. anything YOU are passionate about!
27
185
1K
@reshmigh
Reshmi Ghosh
2 months
It is an infinite glitch circle now!
@abeirami
Ahmad Beirami ✈️ NeurIPS
2 months
@nmboffi But who are these reviewers? They are the same authors. I think we should teach young members of our community to value "learning a new nugget of information" over "obtaining a bold number in a table."
0
0
1
@sarahmsachs
Sarah Sachs
2 months
Being at top of @OpenAI token usage list is a vanity metric. Our job as engineers is to minimize token usage (aka latency and cost) while maximizing value by precise tool definitions and clever model routing. My dream is to grow arr and move lower on this list…
168
137
5K
@reshmigh
Reshmi Ghosh
2 months
Can someone in the room define what is the commonly accepted definition of AGI?
@slow_developer
Haider.
2 months
Important thread on AGI from Anthropic researcher: - we're likely to see AI solving real open research problems in math in the next months - by 2027, models could complete a full day's software work with 50% success - compute power might grow 10,000x in the next five years - we
1
0
0
@reshmigh
Reshmi Ghosh
2 months
More internship opportunities for those that are looking
@tanshawn
Shawn Tan
2 months
We're looking for 2 interns for Summer 2026 at the MIT-IBM Watson AI Lab Foundation Models Team. Work on RL environments, enterprise benchmarks, model architecture, efficient training and finetuning, and more! Apply here:
0
0
0
@elder_plinius
Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭
2 months
🚨 JAILBREAK ALERT 🚨 ANTHROPIC: PWNED 🤗 CLAUDE-SONNET-4.5: LIBERATED 🦅 Woooeee this model is a real smarty pants!! I ain't never seen recipes quite like this! High level of detail all around, code especially 👀 Sonnet 4.5 also has a tendency to make some fairly impressive
72
121
2K
@lauriewired
LaurieWired
2 months
if you’re an EE, CS, or cryptography student write your thesis on public key cryptography at the image sensor level Proof of Physical capture will become a backbone of society soon.
@OpenAI
OpenAI
2 months
Sora 2 is here.
289
2K
23K
@vasuman
vas
2 months
Claude 4.5 Sonnet just refactored my entire codebase in one call. 25 tool invocations. 3,000+ new lines. 12 brand new files. It modularized everything. Broke up monoliths. Cleaned up spaghetti. None of it worked. But boy was it beautiful.
530
583
13K
@reshmigh
Reshmi Ghosh
2 months
Hear hear Interns
@yunyao_li
Yunyao Li
2 months
🚀 I'm hiring 2026 Applied Scientist / ML Engineering Interns to push the frontier of multi-agent AI for the enterprise. 💡 Research NLU, generative & agent-based AI, machine learning ⚡ Build scalable models, benchmark datasets & metrics 🤝 Create impactful solutions for
0
1
1
@kushanmitra
Kushan Mitra
3 months
Vah, pothole alerts built in to @atherenergy maps for multiple cities
235
958
12K
@Hesamation
ℏεsam
3 months
ML interview question: why do embeddings come in 768 or 1024? - “because BERT did it” - “because of GPU optimization” BUT WHY?! The replies under this post is everything wrong with current courses and blog posts: superficiality. this isn’t reasoning, it’s memorization
@atulit_gaur
atulit
3 months
Fun question to ask in an ml interview, “Why do embedding dimensions come in neat sizes like 768 or 1024, but never 739?” If they can't answer it, it's fine but if they do, you've stumbled upon a real gem.
46
80
3K
@rohanpaul_ai
Rohan Paul
4 months
The paper shows reasoning models often answer multi-hop questions while straying from the needed steps. Multi-hop questions need information from several documents linked in a chain. The authors track each jump between documents as a hop, check if all required sources are
1
1
8
@reshmigh
Reshmi Ghosh
4 months
@UMassAmherst @UofMaryland (n/n) If these findings sound interesting to you, give the paper a read: 🤝 Huge thanks to our amazing collaborators for making this possible. @BasuSamyadeep, @Microsoft 📄 Read the full paper: https://t.co/upzlb39m3n #ReasoningModels #AI #LLM #AIResearch #MultiHopQA
1
0
1
@reshmigh
Reshmi Ghosh
4 months
@UMassAmherst @UofMaryland (5/n) 🔍 While Illusion of Thinking paper shows how reasoning models collapse under high complexity in puzzles. Our work focuses on real-world Q/A, mirroring the AI based search process, showing how #reasoning breaks down even when the task is solvable.
1
0
1