Anirudh Khatry
@AnirudhKhatry
Followers
630
Following
6K
Media
30
Statuses
502
CS PhD @UTCompSci | Advised by @IsilDillig and @gregd_nlp | Previously @ProseMsft @MSFTResearch | AI4Code | Guitarist | VJTI ‘21
Joined April 2022
🚀Introducing CRUST-Bench, a dataset for C-to-Rust transpilation for full codebases 🛠️ A dataset of 100 real-world C repositories across various domains, each paired with: 🦀 Handwritten safe Rust interfaces. 🧪 Rust test cases to validate correctness. 🧵[1/6]
3
21
68
I’m at #NeurIPS2025 this week! Come say hi 👋 I’ll be presenting our ✨Spotlight✨ paper 🥑CoRe on #LLM code reasoning at poster session: 📅 Fri, Dec 5 ⏰ 4:30–7:30 PM 📍 Exhibit Hall C/D/E Benchmark + details: https://t.co/Ss7TBVQOSX
#AI #Reasoning #StaticAnalysis #LLM4code
At #NeurIPS? Come by our CoRe (✨ Spotlight) poster on #LLM code-#reasoning capabilities and meet my student Danning (@danning_x). She‘ll be in Exhibit Hall C/D/E on Fri, Dec 5, 4:30–7:30 p.m. PST. Stop in if you’re around! @PurdueCS @NeurIPSConf @cerias
1
2
14
Come see our #NeurIPS2025 spotlight poster on learning interestingness functions for open-ended mathematical discovery! Happening right now (11 am poster session) at booth 1506.
Come to our poster in Exhibit CDE from 11 to 2 today! Poster #1506, the whole team is ready to talk about our work!
0
10
32
Happy to see a full house/engaged audience for this talk on teaching models to provide granular, persuasion-balanced, and calibrated judgments! Thanks CapitalOne for organizing and for the invitation!
🚨 Excited to be (remotely) giving a talk tomorrow 12/2 at the "Exploring Trust and Reliability in LLM Evaluation" #NeurIPS expo workshop! I’ll be presenting our work on pragmatic training to improve calibration and persuasion, and skill-based granular evaluation for data
0
1
12
At NeurIPS until Dec 7—open to chat about program synthesis, reasoning-focused DSL design, robust LLM eval, and agentic workflows for software/systems tasks. U of Alberta is hiring (Robotics, CV/Graphics, Networks, TCS); happy to share my experience.
1
4
11
@LiyanTang4 @sebajoed ChartMuseum: https://t.co/e2LK74TsSB AstroVisBench: https://t.co/vCJgJQalEh Meet with me if you're interested!
calendly.com
0
2
4
I'm at NeurIPS until Friday! This morning, catch: @LiyanTang4 presenting ChartMuseum, testing if VLMs can do visual reasoning over charts @sebajoed presenting AstroVisBench, testing if coding LLMs can work with real astro data workflows & link in thread if you want to meet!
3
10
43
📢 Postdoc position 📢 I’m recruiting a postdoc for my lab at NYU! Topics include LM reasoning, creativity, limitations of scaling, AI for science, & more! Apply by Feb 1. (Different from NYU Faculty Fellows, which are also great but less connected to my lab.) Link in 🧵
4
56
132
Ever wondered how LLMs generalize to entirely new patterns? In our Spotlight paper at #neurips2025, we study this in a fully controlled setting and show the minimal transformer architecture needed to learn induction heads. Paper Link: https://t.co/dFnKwmh3uC 🧵👇
1
17
42
Wrapping up the Argentina week with a final panel: @KFerles joins @codytouchgrass, @vwuestholz, and @dtumad for a deep dive into Formal Verification and Fuzzing at the @eth_proofs event by @ethereumfndn. Thank you @alexanderlhicks for the smooth moderation!
0
4
16
📢 Some big (& slightly belated) life updates! 1. I defended my PhD at MIT this summer! 🎓 2. I'm joining NYU as an Assistant Professor starting Fall 2026, with a joint appointment in Courant CS and the Center for Data Science. 🎉 🔬 My lab will focus on empirically studying
102
90
2K
we released Olmo 3! lot of exciting stuff but wanna focus on: 🐟Olmo 3 32B Base, the best fully-open base model to-date, near Qwen 2.5 & Gemma 3 on diverse evals 🐠Olmo 3 32B Think, first fully-open reasoning model approaching Qwen 3 levels 🐡12 training datasets corresp to
13
19
125
How might we guide AI to generate *interesting* mathematical theories? How do we capture the notion of "interestingness"? Happy to share our new work on learning interestingness in automated theory formation! 🧵
2
7
38
COLM is going to San Francisco for 2026! 🗓️Dates: October 6-9, 2026 🏨Venue: Hilton San Francisco Union Square Website and CFPs for papers and workshops coming up soon!
7
51
429
look how happy they are submit to COLM
1
13
141
How do we teach LLMs not just to reason, but to reflect, debug, and improve themselves? We at AWS AI Labs introduce MURPHY 🤖, a multi-turn RL framework that brings self-correction into #RLVR (#GRPO). 🧵👇 Link: https://t.co/3kFjI5mxR5
2
20
29
Reminder to apply to Cornell's SE group for PhD! DDL: Dec 15. RT!
📢 The Software Engineering group at @Cornell_Bowers is growing fast -- we're now 8 PhD students strong! I’m recruiting PhD students for Fall 2026! If you are interested in the intersection of SE and AI, apply to Cornell CS and reach out! Ddl: Dec 15, 2025. RT!
0
1
7
UT Austin is doubling its supercomputing cluster to more than 1000 GPUs. This cluster has been a key for open source AI. Datacomp , DCLM, OpenThoughts and many other open source projects by researchers in Austin and many other universities and labs around the world critically
UT gets more compute! https://t.co/LZPDhJpAz9
2
12
137
🚨 Excited to share Gistify! Often the easiest way to understand large/complicated repos is by playing around with test cases and tracing back through the code that is executed. Gistify tasks models with turning a codebase and an entry-point (e.g. command, unit test) into a
🚨 Excited to announce Gistify!, where a coding agent must extract the gist of a repository: generate a single, executable, and self-contained file that faithfully reproduces the behavior of a given command (e.g., a test or entrypoint). ✅ It is a lightweight, broadly applicable
0
9
15
🚨 I'll be presenting virtually tonight at 7PM ET/ 8AM CST on Gather! I'll be talking about how strong LLMs can exploit loopholes introduced by ambiguous instructions, and what that means for safety! P.s. I am hiring Ph.D. students for my lab at UT Austin CS, applications due
🚨 Check out our awesome students/postdocs' papers at #EMNLP2025 and say hi to them 👋! Also, I will give a keynote (virtually) on "Attributable, Conflict-Robust, and Multimodal Summarization with Multi-Source Retrieval" at the NewSumm workshop. -- Jaehong (in-person) finished
0
7
26