
Nimit Kalra
@qw3rtman
Followers
1K
Following
3K
Media
30
Statuses
170
research @haizelabs, prev @citadel @utaustin
nyc
Joined October 2011
Discussing "Mind the Gap" tonight at @haizelabs's NYC AI Reading Group with @leonardtang_ and @willccbb. Authors study self-improvement through the "Generation-Verification Gap" (model's verification ability over its own generations) and find that this capability log scales with
Still noodling on this, but the generation-verification gap proposed by @yus167 @_hanlin_zhang_ @ShamKakade6 @udayaghai et al. in is a very nice framework that unifies a lot of thoughts around self-improvement/verification/bootstrapping reasoning.
3
9
61
evals evals evals.
Mercor (@mercor_ai) is now working with 6 out of the Magnificent 7, all of the top 5 AI labs, and most of the top application layer companies. One trend is common across every customer: we are entering The Era of Evals. RL is becoming so effective that models will be able to.
1
0
12
@jxmnop ah yes it was:
my definition of science:.the continual process of (a) generating artifacts that are surprising or useful and (b) explaining them. these days there's exponential growth in papers, and in useful artifacts, but we don't often discover good explanations. so i was happy to stumble
0
0
6
think it was @jxmnop who said that science is about generating artifacts. inspired me to really focus on this this past week, starting with some internal eng tools and paper summaries. grinding out a couple more researchy things for the next couple weeks :) super excited to.
2
0
34
RT @leonardtang_: New open-source alert!. spoken: a unified abstraction over realtime speech-to-speech foundation models. Run any S2S mode….
0
5
0
qwen RL has felt icky recently, but these authors get llama RL to match
What Makes a Base Language Model Suitable for RL?. Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”:. (1) Is the magic only happening on Qwen + Math?.(2) Does the "aha moment" only spark during math reasoning?.(3) Is evaluation hiding some tricky traps?
2
6
99
Super cool interactive blog post by @RoundtableDotAI examining the usage dynamics between humans and bots. Obviously an important problem, especially if Operator becomes easy to hijack. Great to see a principled study and the differences visualized nicely
1
4
12
gpt-4.1-mini thinks NYC >> SF. Image A and Image B both depict a black-and-white artistic rendition of a dove flying over a city skyline. However, Image A has a more intricate and detailed linework style, particularly in the city buildings and the dove's feathers, giving it a
3
0
13
RT @leonardtang_: Verdict systems can now judge image inputs. Score product photos. Ad creatives. UI mockups. Haize anime birds. Judge an….
0
10
0
@yus167 @_hanlin_zhang_ @ShamKakade6 @udayaghai + "Can Large Reasoning Models Self-Train?" by @shafayat_sheikh @rsalakhu et al. who use self-consistency as a proxy for a self-verification signal.
0
0
4
Still noodling on this, but the generation-verification gap proposed by @yus167 @_hanlin_zhang_ @ShamKakade6 @udayaghai et al. in is a very nice framework that unifies a lot of thoughts around self-improvement/verification/bootstrapping reasoning.
1
2
21