Anish Athalye @anishathalye X Profile

Anish Athalye

@anishathalye

Followers

4K

Following

500

Media

41

Statuses

213

cto @cleanlabai • prev phd @mit_csail • research at https://t.co/MdknnUE4C6 • blog at https://t.co/oGOMQyhxv5 • open-source at https://t.co/VawMWMr84F

Cambridge, MA

Joined April 2012

Don't wanna be here? Send us removal request.

Anish Athalye

@anishathalye

3 days

Always fun to see when a grad school side project grows to the point where it's powering real systems! Nice blog post from the S2 developers on how they do linearizability testing with deterministic simulation (using, among other tools, Porcupine):

s2.dev

How we validate strong consistency

0

2

Anish Athalye

@anishathalye

10 days

The first rule of programming: Bug report:

github.com

Here's a minimized repro: from collections.abc import Callable async def f[T](g: Callable[[T], bool]) -> T | None: return None async def f2[T](g: Callable[[T], bool]) -> T | None: return ...

0

Grok

@grok

6 days

Join millions who have switched to Grok.

248

489

4K

Anish Athalye

@anishathalye

10 days

Coding agents follow "it's always your fault" a bit too well. When they hit a compiler bug, they'll rearrange methods, tweak syntax, and go in circles rather than consider the tooling might be wrong. Had to manually identify and minimize this mypy bug after Claude Code couldn't

1

0

1

Anish Athalye

@anishathalye

1 month

If you have suggestions for topics to cover in the next iteration of the course, please share them in this thread!.

3

1

6

Anish Athalye

@anishathalye

1 month

Lecture videos: / notes:

missing.csail.mit.edu

0

31

Anish Athalye

@anishathalye

1 month

Missing Semester has grown past 100K subscribers on YouTube. Appreciate all the engagement and support!. We plan to teach another iteration of the course in January 2026, revising the curriculum and covering new topics like AI IDEs and vibe coding.

9

45

709

Anish Athalye

@anishathalye

2 months

Incidentally, this is how I first got interested in ML.

github.com

Neural style in TensorFlow! 🎨. Contribute to anishathalye/neural-style development by creating an account on GitHub.

0

5

Anish Athalye

@anishathalye

2 months

My favorite way to measure progress in AI: finding papers obsoleted by ChatGPT prompts

1

0

10

Anish Athalye

@anishathalye

3 months

Code/binary here:

github.com

Magic auto brightness based on screen contents 💡. Contribute to anishathalye/lumen development by creating an account on GitHub.

0

Anish Athalye

@anishathalye

3 months

Ever get blinded when writing code late at night and you alt-tab from your dark-mode terminal to your browser? Made this little macOS utility to solve this little problem, just updated for the latest macOS. No thanks to AI for hallucinating BrightnessKit.framework.

1

0

2

Anish Athalye

@anishathalye

4 months

@CleanLab @blocks @OpenAI Needed to do higher-level planning to guide the LLM, where it could achieve > ~20% success rate, and then with prompting or rejection sampling I could get it over the finish line for subtasks. But over time, we're going to be able to operate at higher levels of abstraction. 4/4.

0

Anish Athalye

@anishathalye

4 months

@CleanLab @blocks @OpenAI Total cost was < $50, and it probably would have been < $5 if I wasn't experimenting with the more expensive models (o4-mini is pretty great at generating code). 3/.

1

0

Anish Athalye

@anishathalye

4 months

@CleanLab @blocks @OpenAI Forkable API integration, data structure processing, Google SSO, responsive design, . 100% vibe coded, I didn't write a single line of code (or even the text in the README): 2/.

github.com

100% vibe coded 🏄‍♂️. Contribute to cleanlab/office-presence-dashboard development by creating an account on GitHub.

1

0

Anish Athalye

@anishathalye

4 months

Vibe coded an office presence dashboard for @cleanlab with @blocks's Goose + @OpenAI's o4-mini at a hackathon last night. Three hours from `git init` to `vercel deploy --prod`. This technology has gotten so good; we're going to see an explosion in purpose-built applications. 1/

1

0

2

Anish Athalye

@anishathalye

4 months

We did a workshop at AIUC that: (1) implements a RAG app on top of Cursor's docs, (2) reproduces the widely-publicized failure from last week, and (3) shows how to automatically catch and reproduce this failure. All slides/code are open-sourced here: (5/5).

github.com

AI User Conference 2025 - Developer Day workshop - GitHub - cleanlab/aiuc-workshop: AI User Conference 2025 - Developer Day workshop

0

7

Anish Athalye

@anishathalye

4 months

What’s the solution? I believe that one ingredient will be intelligent systems that evaluate the output of these LLMs in real-time and keep them in check, building on and combining techniques like LLM-as-a-judge, using per-token logprobs, and statistical methods. (4/5).

1

0

4

Anish Athalye

@anishathalye

4 months

Why do such failures occur? These next-token-prediction models are nondeterministic and can be fragile. And they’re not getting consistently better over time—OpenAI’s latest models like o3 and o4-mini show higher hallucination rates compared to previous versions. (3/5).

1

0

3

Anish Athalye

@anishathalye

4 months

It’s been over a year since the well-publicized failures of Air Canada’s support bot and NYC’s MyCity bot. And these AI’s are still failing spectacularly in production, with the most recent debacle being Cursor’s AI going rogue and triggering a wave of cancellations. (2/5).

1

0

4

Anish Athalye

@anishathalye

4 months

We reproduced (and fixed!) Cursor’s rogue customer support AI. (1/5)

2

15

Anish Athalye

@anishathalye

4 months

RT @CleanlabAI: At the @awscloud + Cleanlab GenAI Workshop, Cleanlab CTO @anishathalye will show how to build trustworthy RAG pipelines us….

0

1

0