anishathalye Profile Banner
Anish Athalye Profile
Anish Athalye

@anishathalye

Followers
4K
Following
500
Media
41
Statuses
213

cto @cleanlabai • prev phd @mit_csail • research at https://t.co/MdknnUE4C6 • blog at https://t.co/oGOMQyhxv5 • open-source at https://t.co/VawMWMr84F

Cambridge, MA
Joined April 2012
Don't wanna be here? Send us removal request.
@anishathalye
Anish Athalye
3 days
Always fun to see when a grad school side project grows to the point where it's powering real systems! Nice blog post from the S2 developers on how they do linearizability testing with deterministic simulation (using, among other tools, Porcupine):
s2.dev
How we validate strong consistency
0
0
2
@grok
Grok
6 days
Join millions who have switched to Grok.
248
489
4K
@anishathalye
Anish Athalye
10 days
Coding agents follow "it's always your fault" a bit too well. When they hit a compiler bug, they'll rearrange methods, tweak syntax, and go in circles rather than consider the tooling might be wrong. Had to manually identify and minimize this mypy bug after Claude Code couldn't
Tweet media one
1
0
1
@anishathalye
Anish Athalye
1 month
If you have suggestions for topics to cover in the next iteration of the course, please share them in this thread!.
3
1
6
@anishathalye
Anish Athalye
1 month
Lecture videos: / notes:
missing.csail.mit.edu
0
0
31
@anishathalye
Anish Athalye
1 month
Missing Semester has grown past 100K subscribers on YouTube. Appreciate all the engagement and support!. We plan to teach another iteration of the course in January 2026, revising the curriculum and covering new topics like AI IDEs and vibe coding.
Tweet media one
9
45
709
@anishathalye
Anish Athalye
2 months
My favorite way to measure progress in AI: finding papers obsoleted by ChatGPT prompts
Tweet media one
1
0
10
@anishathalye
Anish Athalye
3 months
Ever get blinded when writing code late at night and you alt-tab from your dark-mode terminal to your browser? Made this little macOS utility to solve this little problem, just updated for the latest macOS. No thanks to AI for hallucinating BrightnessKit.framework.
1
0
2
@anishathalye
Anish Athalye
4 months
@CleanLab @blocks @OpenAI Needed to do higher-level planning to guide the LLM, where it could achieve > ~20% success rate, and then with prompting or rejection sampling I could get it over the finish line for subtasks. But over time, we're going to be able to operate at higher levels of abstraction. 4/4.
0
0
0
@anishathalye
Anish Athalye
4 months
@CleanLab @blocks @OpenAI Total cost was < $50, and it probably would have been < $5 if I wasn't experimenting with the more expensive models (o4-mini is pretty great at generating code). 3/.
1
0
0
@anishathalye
Anish Athalye
4 months
@CleanLab @blocks @OpenAI Forkable API integration, data structure processing, Google SSO, responsive design, . 100% vibe coded, I didn't write a single line of code (or even the text in the README): 2/.
Tweet card summary image
github.com
100% vibe coded 🏄‍♂️. Contribute to cleanlab/office-presence-dashboard development by creating an account on GitHub.
1
0
0
@anishathalye
Anish Athalye
4 months
Vibe coded an office presence dashboard for @cleanlab with @blocks's Goose + @OpenAI's o4-mini at a hackathon last night. Three hours from `git init` to `vercel deploy --prod`. This technology has gotten so good; we're going to see an explosion in purpose-built applications. 1/
1
0
2
@anishathalye
Anish Athalye
4 months
We did a workshop at AIUC that: (1) implements a RAG app on top of Cursor's docs, (2) reproduces the widely-publicized failure from last week, and (3) shows how to automatically catch and reproduce this failure. All slides/code are open-sourced here: (5/5).
Tweet card summary image
github.com
AI User Conference 2025 - Developer Day workshop - GitHub - cleanlab/aiuc-workshop: AI User Conference 2025 - Developer Day workshop
0
0
7
@anishathalye
Anish Athalye
4 months
What’s the solution? I believe that one ingredient will be intelligent systems that evaluate the output of these LLMs in real-time and keep them in check, building on and combining techniques like LLM-as-a-judge, using per-token logprobs, and statistical methods. (4/5).
1
0
4
@anishathalye
Anish Athalye
4 months
Why do such failures occur? These next-token-prediction models are nondeterministic and can be fragile. And they’re not getting consistently better over time—OpenAI’s latest models like o3 and o4-mini show higher hallucination rates compared to previous versions. (3/5).
1
0
3
@anishathalye
Anish Athalye
4 months
It’s been over a year since the well-publicized failures of Air Canada’s support bot and NYC’s MyCity bot. And these AI’s are still failing spectacularly in production, with the most recent debacle being Cursor’s AI going rogue and triggering a wave of cancellations. (2/5).
1
0
4
@anishathalye
Anish Athalye
4 months
We reproduced (and fixed!) Cursor’s rogue customer support AI. (1/5)
Tweet media one
2
2
15
@anishathalye
Anish Athalye
4 months
RT @CleanlabAI: At the @awscloud + Cleanlab GenAI Workshop, Cleanlab CTO @anishathalye will show how to build trustworthy RAG pipelines us….
0
1
0