Anthony Susevski
@asusevski
Followers
466
Following
8K
Media
179
Statuses
3K
ml enjoyer. find it from within or be without. recovering Liberty village resident
Joined March 2021
Zoom, the AI research lab?
Zoom achieved a new state-of-the-art (SOTA) result on Humanity’s Last Exam (HLE): 48.1% — outperforming other AI models with a 2.3% jump over the previous SOTA. ✨ HLE is one of the most rigorous tests in AI, built to measure real expert-level knowledge and deep reasoning across
37
174
8K
Holy AIs! Small, narrow finetuning can massively distort an LLM. Teach it outdated bird names and it starts talking like it's the 19th century. Feed it harmless facts that resemble Hitler and it slips into a Hitler persona. Even “benevolent” training can hide inductive
2
4
37
SGLang for RecSys in LinkedIn!
Loved seeing how @LinkedIn is using SGLang to accelerate large-scale RecSys ranking, and contributing major features back to both SGLang and @NVIDIA’s FlashInfer open-source stack. SGLang is now powering LinkedIn’s latency-critical ranking workflows with: • Massive prefill
5
5
90
Today we started rolling out SimGym — a system that creates “digital customers” that behave like real ones. They browse your site, complete tasks, and reveal optimization opportunities. You can even run A/B tests with *zero* live traffic! Spent a year developing it.
128
167
3K
In today's episode of programming horror... In the Python docs of random.seed() def, we're told "If a is an int, it is used directly." [1] But if you seed with 3 or -3, you actually get the exact same rng object, producing the same streams. (TIL). In nanochat I was using the
217
494
8K
(1/n) Tiny-A2D: An Open Recipe to Turn Any AR LM into a Diffusion LM Code (dLLM): https://t.co/yYNBo4N99B Checkpoints: https://t.co/fBG4MmoaTZ With dLLM, you can turn ANY autoregressive LM into a diffusion LM (parallel generation + infilling) with minimal compute. Using this
6
70
328
For anyone who cares about structured output benchmarks as much as I do, here's an early Christmas present 🎁 ! Pretty well thought out from the folks @CleanlabAI. Seems like I'll def be using it to compare LLMs using BAML and DSPy! https://t.co/clQ0BuaX9l
github.com
A Structured Output Benchmark whose 'ground-truth' is actually right - cleanlab/structured-output-benchmark
4
11
60
We built a dead code finder for our gargantuan codebase. Who wants it.
54
31
633
Ivan Sorokin and I are the official winners on the Arc Prize competition, with a significant lead over other teams. Thanks to @kaggle and @arcprize for hosting the competition. NVIDIA tech blog summarizing what we did: https://t.co/BU8nHPCliJ Our writeup:
38
52
528
normally it takes about two minutes to cold-start a @vllm_project server for @MistralAI 3 3B -- mostly @PyTorch compilation and CUDA graph capture with @modal GPU snapshots, you can cut that down to just 12 seconds
8
14
159
Someone broke the trillion row record by generating 1000 billion row files, uploading to Google cloud and running duckdb queries on each file Lmao
1
0
1
We reached out to almost every media house in Canada about TSFM, told them this was a 3-month lecture series featuring speakers from Cohere, Runway, DeepMind, Isomorphic Labs, Prime Intellect, GPU Mode, Modular, and more. Not one wrote about it. There was a story here-about the
20
8
133
AI Engineer's advent calendar gives free credits/subs for the largest AI platforms 🤯 claim free credits & memberships while it lasts! 🙌🏻
To celebrate the holiday season, we’re launching 25 Days of Agents - an advent calendar of exclusive deals from top AI companies to help you build your own agents. Every day until December 25th, we’ll unlock a new deal from partners including @Railway @Cloudflare @convex
0
1
28
40 characters to capture the challenges, allies, and moments that define what it means to be an ML researcher. Meet the legends and get your booster pack at #NeurIPS2025. 🎴 #LabLegends
1
12
30
The leaderboard illusion poster session is Thursday 11:00 AM – 2:00 PM PST Exhibit Hall C,D,E #4109. Extremely proud of our work which remains urgent for ensuring we measure progress reliably. It was such a pleasure working w @singhshiviii @mziizm @YiyangNan @beyzaermis...
It is critical for scientific integrity that we trust our measure of progress. The @lmarena_ai has become the go-to evaluation for AI progress. Our release today demonstrates the difficulty in maintaining fair evaluations on @lmarena_ai, despite best intentions.
2
6
41