Maxim Khomiakov @maximkhv X Profile

Maxim Khomiakov

@maximkhv

Followers

325

Following

787

Media

6

Statuses

97

managing context windows

https://t.co/RSHfg6zVXJ

Copenhagen

Joined May 2018

Don't wanna be here? Send us removal request.

Ankesh Anand

@ankesh_anand

4 days

"how can flash beat pro??" -> the answer is RL! flash is not just a distilled pro. we've had lots of exciting research progress on agentic RL which made its way into flash but was too late for pro. can't wait to finally bring them to pro👀

Lisan al Gaib

@scaling01

4 days

Gemini 3 Flash scores higher than GPT-5.2, Opus 4.5 and Gemini 3 Pro on SWE-Bench Verified ???

116

267

4K

Maxim Khomiakov

@maximkhv

8 days

agree, its truly a great release. I'd recommend @natolambert NeurIPS talk too

Cameron R. Wolfe, Ph.D.

@cwolferesearch

8 days

Olmo 3 is one of the most valuable open research artifacts to ever be released. Although Olmo 3 models are slightly behind state-of-the-art, their value goes beyond the models themselves. The artifacts for Olmo 3 give anyone the ability to conduct rigorous experiments with

1

3

Maxim Khomiakov

@maximkhv

11 days

Liking this, a lot.

Locke Cai

@couplefire12

11 days

RL for reasoning often rely on verifiers — great for math, but tricky for creative writing or open-ended research. Meet RARO: a new paradigm that teaches LLMs to reason via adversarial games instead of verification. No verifiers. No environments. Just demonstrations. 🧵👇

0

4

Maxim Khomiakov

@maximkhv

22 days

Nano banana is just OP

willie

@ReflctWillie

22 days

Turns out Nano Banana Pro is great at landscaping plans... Just get an ugly cutout from Google Maps with "Here is my property, create a landscape architecture style map". Then just annotate the image and throw it in until you land on the design. Is this vibe gardening?

0

2

Maxim Khomiakov

@maximkhv

24 days

Imagine current models in Nov. 22. Goalpost redefined

Russ Salakhutdinov

@rsalakhu

24 days

Predicting AGI/ASI timelines has become trendy, so I’ll offer mine: AGI/ASI is 5–10 years away. It always has been. It always will be.

0

2

AK

@_akhaliq

1 month

MMaDA-Parallel Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

11

60

451

Maxim Khomiakov

@maximkhv

1 month

this time, it can also do the dishes

Oriol Vinyals

@OriolVinyalsML

1 month

🤔🤔🤔

0

Maxim Khomiakov

@maximkhv

1 month

Found a bug with this tool just recently, can recommend. The ability to drill down in your LLM traces in an easy fashion is quite useful. When you treat traces like training data, you may need to view it in a different way.

Thomas Ahle

@thomasahle

1 month

I ended up making it - trace taxi is up!

0

2

4

kalomaze

@kalomaze

1 month

> The way humans think look a lot more like diffusion than autoregressive. i will never, ever understand this claim or the intuitions behind it. ah yes. the human mind is... learning a scoring function to... reverse gaussian noise... (?) ... spatially (???)

Hieu Pham

@hyhieu226

1 month

Naive question, so please roast me. Why don't we have diffusion reasoning models? The way humans think look a lot more like diffusion than autoregressive.

57

9

515

Lucas Beyer (bl16)

@giffmana

2 months

> There’s no free lunch. > When you reduce the complexity of attention, you pay a price. > The question is, where? This is *exactly* how I typically end my Transformer tutorial. This slide is already 4 years old, I've never updated it, but it still holds:

Pengyu Zhao

@zpysky1125

2 months

MiniMax M2 Tech Blog 3: Why Did M2 End Up as a Full Attention Model? On behave of pre-training lead Haohai Sun. ( https://t.co/WH4xOD9KrT) I. Introduction As the lead of MiniMax-M2 pretrain, I've been getting many queries from the community on "Why did you turn back the clock

35

61

901

机器之心 JIQIZHIXIN

@jiqizhixin

2 months

Huge breakthrough from DeepMind! In their latest Nature paper, “Discovering state-of-the-art reinforcement learning algorithms,” they show that AI can autonomously discover better RL algorithms. "Enabling machines to discover learning algorithms for themselves is one of the

49

260

2K

GLADIA Research Lab

@GladiaLab

2 months

LLMs are injective and invertible. In our new paper, we show that different prompts always map to different embeddings, and this property can be used to recover input tokens from individual embeddings in latent space. (1/6)

283

1K

11K

Kwang Moo Yi

@kwangmoo_yi

2 months

Bai et al., "Positional Encoding Field" Make your RoPE encoding 3D by including a z axis, then manipulate your image by simply manipulating your positional encoding in 3D --> novel view synthesis. Neat idea.

7

55

480

Faris Sbahi 🏴‍☠️

@FarisSbahi

2 months

We’re hiring research engineers, software engineers, silicon engineers, and operators at @NormalComputing. Join us as we rethink ASICs with Physics and AI. We have offices and are hiring across NYC, LON, CPH, and SF. We are looking for folks with backgrounds across RL/agents,

4

16

Maxim Khomiakov

@maximkhv

2 months

prob one of the funniest LPs around. dealflow must be off the charts

Henrick Johansson

@compliantvc

2 months

I struggle to think of ONE exciting startup that's come out of the USA in the last 10 years All that country can produce is Ponzi schemes Meanwhile, Europe continues to lead the way in building safe, compliant companies that help people and the government

0

1

3

Lucas Beyer (bl16)

@giffmana

2 months

"What do 1M and 500K context windows have in common? They are both actually 64K."

wh

@nrehiew_

2 months

New post! This time, about the current state of Long Context Evaluation. I discuss existing benchmarks, what makes a good long context eval, what's missing from existing ones and introduce a new one - LongCodeEdit :)

31

65

1K

eric provencher

@pvncher

2 months

I’m really tired of labs pointing to swe bench as a sign that their new tiny model is SOTA. It’s just a bunch of python problems that are well leaked into the training set. Tiny models can rarely replace larger ones with any context pressure in practice.

Claude

@claudeai

2 months

Introducing Claude Haiku 4.5: our latest small model. Five months ago, Claude Sonnet 4 was state-of-the-art. Today, Haiku 4.5 matches its coding performance at one-third the cost and more than twice the speed.

20

14

271

Maxim Khomiakov

@maximkhv

2 months

1M tokens are not all equal, sparse attention etc.

Lucas Beyer (bl16)

@giffmana

2 months

So this is a first for me. I just had a pretty big refactoring session with codex-cli, and eventually it started going completely off the rails. It became very dumb, made bad mistakes, only followed half my instructions and made up the other half, misused tools in ways so stupid