Jaidev Shah
@JaidevShah4
Followers
611
Following
5K
Media
47
Statuses
2K
@amazonscience | @microsoft AI | @columbia | agents, search and personalization https://t.co/FcgHkRNkph
San Francisco
Joined April 2020
weird that I can’t edit messages and restore to repo checkpoints on the @openai codex extension Almost all other coding agents provide this (eg cursor, copilot)
0
0
0
Wow.
Affirmative action discriminated against Asian college applicants. Post-SCOTUS rule, we now see the extent. Johns Hopkins’ first year enrollment for 2023-> 2025 by race: Asians 25.6 -> 45.1% Blacks 9.8 -> 4% Hispanics 20.8 -> 10.1% Whites 18.3 -> 21% https://t.co/mfLWLgWmkn
0
0
0
“Tell me about a time you disagreed with a coworker” ➡️ “Tell me about a time you disagreed with a LLM” We live in incredible times
2026 interview questions: - you are in the middle of a refactor and the model says 8% context left before auto-compaction. what do you do? - how do you decide which tasks to give to claude, codex gpt 5.2 xhigh, and chatgpt pro? - tell me about a time you disagreed with an LLM
0
0
2
Been really enjoying this paper by @sunweiwei12 et al. lately: https://t.co/WG5siT0kOm I really like how it treats context management as something the agent actually learns, instead of an external system hack like summarization or fixed multi-agent setups. The test-time idea is
5
47
335
distillation might be one of the most impactful technology of the llm era, really impressive scores
One of the things we strive to do with each new Gemini release is to make the new Flash model as good or better than the previous model’s Pro model. Gemini 3 Flash exceeds Gemini 2.5 Pro on nearly every metric, often by very large margins, and almost matches Gemini 3 Pro on most
7
11
273
How long have you been "planning to understand" how modern LLM inference works? We just gave you a readable version of SGLang you can finish over the weekend. Introducing mini-SGLang ⚡ We distilled SGLang from 300K into 5,000 lines. Kept the core design, cut the complexity.
29
172
1K
Apple's new paper is mindblowing They showed that one attention layer is enough to turn pretrained vision features into SoTA image generators! This dramatically simplifies diffusion models while keeping the top-tier quality
27
253
2K
Very interesting observations on the interaction between pre/mid/post-training. 1. The gain from RL is largest when the task is neither too easy nor too hard. 2. Pretraining should focus on cultivating broader atomic skills - RL can combine them to solve composite problems. 3.
9
58
356
https://t.co/WyCcSxnv74 much much better
ml.ink
Conference schedule for NeurIPS 2025 in San Diego. Browse events, tutorials, workshops, posters, and talks.
#NeurIPS2025 I hated the old Whova app, but for me, the AtConf app is crashing constantly (iPhone 11; latest OS) @NeurIPSConf Does it connect to my calendar? Does it have a recommendation system? This shouldn't be difficult
0
0
1
Spent thanksgiving weekend training steerable generative recommender models Burned 200M tokens on cursor most of which were Codex 5.1 or Opus 4.5 calls, and I still haven’t hit the limit on my $20/ month subscription. I’d estimate it easily created $2k+ in value for me
1
0
3
I'll be in San Diego next week for @NeurIPSConf from 12/2 to 12/8. Interested in startups, applied RL or search/recsys ? I'd love to chat or grab a coffee looking forward to catching up with old friends and making new ones!
0
0
3
great post!
An amazing blog post dropped on @huggingface explaining how today's LLM inference engines like @vllm_project work! The concept of "continuous batching" is explained, along with KV-caching, attention masking, chunked prefill, and decoding. Continuous batching is the idea of
0
0
0
Love to see more fully open post-training recipes (this one multimodal reasoning). It's surprising how rare post-training data is because the opportunity for impact is huge. Lots of people will try it and simple data methods still can improve on SOTA.
🚀 Introducing OpenMMReasoner — a transparent, reproducible recipe for multimodal reasoning. We present a 2-stage pipeline uses 874K SFT samples with step-by-step validation and 74K high-quality RL samples. Paper: https://t.co/87o8IwI26Y More in thread:
3
21
199
> be child prodigy chess master at 4 yrs age, represent England in international junior chess championships > after a tiring 10 hr chess match at 13 yrs age, throws away the match to a 30yr old grandmaster and decides the brain is supposed to be used for greater things than
Got a fresh dose of “long $GOOG” and “long London” from this. Highly recommend watching this. Chronicles the story of Deepmind and Demis’ search for AGI.
50
302
5K
Too many storytellers in SF, fewer doers
There’s a belief in SF that with enough money you can solve anything. For a lot of technical companies, VCs are therefore looking for a charismatic founder who they believe will be able to get that money. In the first few rounds, they care less about the actual technical
0
0
3
I used @genspark_ai ‘s AI slide deck feature as well as @GammaApp on the same task, blown away by how good Genspark was- significantly higher quality than Gamma. as well as better researched slides. I also like that you can follow the chain of the tool calls
0
0
0