qw3rtman Profile Banner
Nimit Kalra Profile
Nimit Kalra

@qw3rtman

Followers
1K
Following
3K
Media
30
Statuses
170

research @haizelabs, prev @citadel @utaustin

nyc
Joined October 2011
Don't wanna be here? Send us removal request.
@qw3rtman
Nimit Kalra
10 days
Discussing "Mind the Gap" tonight at @haizelabs's NYC AI Reading Group with @leonardtang_ and @willccbb. Authors study self-improvement through the "Generation-Verification Gap" (model's verification ability over its own generations) and find that this capability log scales with
Tweet media one
@qw3rtman
Nimit Kalra
29 days
Still noodling on this, but the generation-verification gap proposed by @yus167 @_hanlin_zhang_ @ShamKakade6 @udayaghai et al. in is a very nice framework that unifies a lot of thoughts around self-improvement/verification/bootstrapping reasoning.
3
9
61
@qw3rtman
Nimit Kalra
5 days
chart crime so bad you gotta transcribe the values by hand and plot it yourself.
2
0
5
@qw3rtman
Nimit Kalra
6 days
evals evals evals.
@BrendanFoody
Brendan (can/do)
6 days
Mercor (@mercor_ai) is now working with 6 out of the Magnificent 7, all of the top 5 AI labs, and most of the top application layer companies. One trend is common across every customer: we are entering The Era of Evals. RL is becoming so effective that models will be able to.
1
0
12
@qw3rtman
Nimit Kalra
7 days
@jxmnop ah yes it was:
@jxmnop
jack morris
13 days
my definition of science:.the continual process of (a) generating artifacts that are surprising or useful and (b) explaining them. these days there's exponential growth in papers, and in useful artifacts, but we don't often discover good explanations. so i was happy to stumble
Tweet media one
Tweet media two
0
0
6
@qw3rtman
Nimit Kalra
7 days
think it was @jxmnop who said that science is about generating artifacts. inspired me to really focus on this this past week, starting with some internal eng tools and paper summaries. grinding out a couple more researchy things for the next couple weeks :) super excited to.
2
0
34
@qw3rtman
Nimit Kalra
9 days
Supports OpenAI Realtime/Gemini Multimodal Live/Amazon Nova Sonic in one nice interface. Code:
1
0
1
@qw3rtman
Nimit Kalra
9 days
🗣️ SOUND ON 🗣️. Realtime speech-to-speech models are super cool, but quite annoying to configure. We built an internal LiteLLM for them and decided to open-source it today to save you the pain. Try it now: `pip install spoken`!
1
1
13
@qw3rtman
Nimit Kalra
9 days
RT @leonardtang_: New open-source alert!. spoken: a unified abstraction over realtime speech-to-speech foundation models. Run any S2S mode….
0
5
0
@qw3rtman
Nimit Kalra
10 days
Stitched together random 200ms chunks from a source audio to bypass provider-side caching. Different providers have different input audio ms → audio token multiples, but I suspect that OpenAI's is 100Hz and Gemini's is 40Hz (please correct if I am wrong 🙃).
0
0
2
@qw3rtman
Nimit Kalra
10 days
OpenAI Realtime vs Gemini Multimodal Live latency (time to first audio token)
Tweet media one
3
0
8
@qw3rtman
Nimit Kalra
10 days
qwen RL has felt icky recently, but these authors get llama RL to match
Tweet media one
@SinclairWang1
Zengzhi Wang
10 days
What Makes a Base Language Model Suitable for RL?. Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”:. (1) Is the magic only happening on Qwen + Math?.(2) Does the "aha moment" only spark during math reasoning?.(3) Is evaluation hiding some tricky traps?
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
6
99
@qw3rtman
Nimit Kalra
10 days
1
2
5
@qw3rtman
Nimit Kalra
10 days
Super cool interactive blog post by @RoundtableDotAI examining the usage dynamics between humans and bots. Obviously an important problem, especially if Operator becomes easy to hijack. Great to see a principled study and the differences visualized nicely
Tweet media one
1
4
12
@qw3rtman
Nimit Kalra
10 days
full code is like 20 lines:
1
0
1
@qw3rtman
Nimit Kalra
10 days
gpt-4.1-mini thinks NYC >> SF. Image A and Image B both depict a black-and-white artistic rendition of a dove flying over a city skyline. However, Image A has a more intricate and detailed linework style, particularly in the city buildings and the dove's feathers, giving it a
Tweet media one
Tweet media two
@qw3rtman
Nimit Kalra
10 days
dropping some work on multimodal evaluators soon. for now, play with image judges in Verdict ⬇️
Tweet media one
3
0
13
@qw3rtman
Nimit Kalra
10 days
dropping some work on multimodal evaluators soon. for now, play with image judges in Verdict ⬇️
Tweet media one
2
0
25
@qw3rtman
Nimit Kalra
11 days
RT @leonardtang_: Verdict systems can now judge image inputs. Score product photos. Ad creatives. UI mockups. Haize anime birds. Judge an….
0
10
0
@qw3rtman
Nimit Kalra
12 days
Been looking at native speech-to-speech foundation models (i.e., end-to-end waveform → waveform) models lately. Interesting that Gemini and Amazon's offerings natively process audio inputs at 16kHz but produce at 24kHz. OpenAI is more consistent at 24kHz for both. .
3
0
16
@qw3rtman
Nimit Kalra
29 days
@yus167 @_hanlin_zhang_ @ShamKakade6 @udayaghai + "Can Large Reasoning Models Self-Train?" by @shafayat_sheikh @rsalakhu et al. who use self-consistency as a proxy for a self-verification signal.
0
0
4
@qw3rtman
Nimit Kalra
29 days
Still noodling on this, but the generation-verification gap proposed by @yus167 @_hanlin_zhang_ @ShamKakade6 @udayaghai et al. in is a very nice framework that unifies a lot of thoughts around self-improvement/verification/bootstrapping reasoning.
@qw3rtman
Nimit Kalra
2 months
when/is verification harder than specification?.
1
2
21