Nimit Kalra @qw3rtman X Profile

Nimit Kalra

@qw3rtman

Followers

1K

Following

3K

Media

30

Statuses

170

research @haizelabs, prev @citadel @utaustin

nyc

Joined October 2011

Don't wanna be here? Send us removal request.

Nimit Kalra

@qw3rtman

10 days

Discussing "Mind the Gap" tonight at @haizelabs's NYC AI Reading Group with @leonardtang_ and @willccbb. Authors study self-improvement through the "Generation-Verification Gap" (model's verification ability over its own generations) and find that this capability log scales with

Nimit Kalra

@qw3rtman

29 days

Still noodling on this, but the generation-verification gap proposed by @yus167 @_hanlin_zhang_ @ShamKakade6 @udayaghai et al. in is a very nice framework that unifies a lot of thoughts around self-improvement/verification/bootstrapping reasoning.

3

9

61

Nimit Kalra

@qw3rtman

5 days

chart crime so bad you gotta transcribe the values by hand and plot it yourself.

2

0

5

Nimit Kalra

@qw3rtman

6 days

evals evals evals.

Brendan (can/do)

@BrendanFoody

6 days

Mercor (@mercor_ai) is now working with 6 out of the Magnificent 7, all of the top 5 AI labs, and most of the top application layer companies. One trend is common across every customer: we are entering The Era of Evals. RL is becoming so effective that models will be able to.

1

0

12

Nimit Kalra

@qw3rtman

7 days

@jxmnop ah yes it was:

jack morris

@jxmnop

13 days

my definition of science:.the continual process of (a) generating artifacts that are surprising or useful and (b) explaining them. these days there's exponential growth in papers, and in useful artifacts, but we don't often discover good explanations. so i was happy to stumble

0

6

Nimit Kalra

@qw3rtman

7 days

think it was @jxmnop who said that science is about generating artifacts. inspired me to really focus on this this past week, starting with some internal eng tools and paper summaries. grinding out a couple more researchy things for the next couple weeks :) super excited to.

2

0

34

Nimit Kalra

@qw3rtman

9 days

Supports OpenAI Realtime/Gemini Multimodal Live/Amazon Nova Sonic in one nice interface. Code:

1

0

1

Nimit Kalra

@qw3rtman

9 days

🗣️ SOUND ON 🗣️. Realtime speech-to-speech models are super cool, but quite annoying to configure. We built an internal LiteLLM for them and decided to open-source it today to save you the pain. Try it now: `pip install spoken`!

1

13

Nimit Kalra

@qw3rtman

9 days

RT @leonardtang_: New open-source alert!. spoken: a unified abstraction over realtime speech-to-speech foundation models. Run any S2S mode….

0

5

0

Nimit Kalra

@qw3rtman

10 days

Stitched together random 200ms chunks from a source audio to bypass provider-side caching. Different providers have different input audio ms → audio token multiples, but I suspect that OpenAI's is 100Hz and Gemini's is 40Hz (please correct if I am wrong 🙃).

0

2

Nimit Kalra

@qw3rtman

10 days

OpenAI Realtime vs Gemini Multimodal Live latency (time to first audio token)

3

0

8

Nimit Kalra

@qw3rtman

10 days

qwen RL has felt icky recently, but these authors get llama RL to match

Zengzhi Wang

@SinclairWang1

10 days

What Makes a Base Language Model Suitable for RL?. Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”:. (1) Is the magic only happening on Qwen + Math?.(2) Does the "aha moment" only spark during math reasoning?.(3) Is evaluation hiding some tricky traps?

2

6

99

Nimit Kalra

@qw3rtman

10 days

by @_magrawal @mdahardy.

1

2

5

Nimit Kalra

@qw3rtman

10 days

Super cool interactive blog post by @RoundtableDotAI examining the usage dynamics between humans and bots. Obviously an important problem, especially if Operator becomes easy to hijack. Great to see a principled study and the differences visualized nicely

1

4

12

Nimit Kalra

@qw3rtman

10 days

full code is like 20 lines:

1

0

1

Nimit Kalra

@qw3rtman

10 days

gpt-4.1-mini thinks NYC >> SF. Image A and Image B both depict a black-and-white artistic rendition of a dove flying over a city skyline. However, Image A has a more intricate and detailed linework style, particularly in the city buildings and the dove's feathers, giving it a

Nimit Kalra

@qw3rtman

10 days

dropping some work on multimodal evaluators soon. for now, play with image judges in Verdict ⬇️

3

0

13

Nimit Kalra

@qw3rtman

10 days

dropping some work on multimodal evaluators soon. for now, play with image judges in Verdict ⬇️

2

0

25

Nimit Kalra

@qw3rtman

11 days

RT @leonardtang_: Verdict systems can now judge image inputs. Score product photos. Ad creatives. UI mockups. Haize anime birds. Judge an….

0

10

0

Nimit Kalra

@qw3rtman

12 days

Been looking at native speech-to-speech foundation models (i.e., end-to-end waveform → waveform) models lately. Interesting that Gemini and Amazon's offerings natively process audio inputs at 16kHz but produce at 24kHz. OpenAI is more consistent at 24kHz for both. .

3

0

16

Nimit Kalra

@qw3rtman

29 days

@yus167 @_hanlin_zhang_ @ShamKakade6 @udayaghai + "Can Large Reasoning Models Self-Train?" by @shafayat_sheikh @rsalakhu et al. who use self-consistency as a proxy for a self-verification signal.

0

4

Nimit Kalra

@qw3rtman

29 days

Still noodling on this, but the generation-verification gap proposed by @yus167 @_hanlin_zhang_ @ShamKakade6 @udayaghai et al. in is a very nice framework that unifies a lot of thoughts around self-improvement/verification/bootstrapping reasoning.

Nimit Kalra

@qw3rtman

2 months

when/is verification harder than specification?.

1

2

21