maximkhv Profile Banner
Maxim Khomiakov Profile
Maxim Khomiakov

@maximkhv

Followers
325
Following
787
Media
6
Statuses
97

managing context windows

Copenhagen
Joined May 2018
Don't wanna be here? Send us removal request.
@ankesh_anand
Ankesh Anand
4 days
"how can flash beat pro??" -> the answer is RL! flash is not just a distilled pro. we've had lots of exciting research progress on agentic RL which made its way into flash but was too late for pro. can't wait to finally bring them to pro👀
@scaling01
Lisan al Gaib
4 days
Gemini 3 Flash scores higher than GPT-5.2, Opus 4.5 and Gemini 3 Pro on SWE-Bench Verified ???
116
267
4K
@maximkhv
Maxim Khomiakov
8 days
agree, its truly a great release. I'd recommend @natolambert NeurIPS talk too
@cwolferesearch
Cameron R. Wolfe, Ph.D.
8 days
Olmo 3 is one of the most valuable open research artifacts to ever be released. Although Olmo 3 models are slightly behind state-of-the-art, their value goes beyond the models themselves. The artifacts for Olmo 3 give anyone the ability to conduct rigorous experiments with
1
1
3
@maximkhv
Maxim Khomiakov
11 days
Liking this, a lot.
@couplefire12
Locke Cai
11 days
RL for reasoning often rely on verifiers — great for math, but tricky for creative writing or open-ended research. Meet RARO: a new paradigm that teaches LLMs to reason via adversarial games instead of verification. No verifiers. No environments. Just demonstrations. 🧵👇
0
0
4
@maximkhv
Maxim Khomiakov
22 days
Nano banana is just OP
@ReflctWillie
willie
22 days
Turns out Nano Banana Pro is great at landscaping plans... Just get an ugly cutout from Google Maps with "Here is my property, create a landscape architecture style map". Then just annotate the image and throw it in until you land on the design. Is this vibe gardening?
0
0
2
@maximkhv
Maxim Khomiakov
24 days
Imagine current models in Nov. 22. Goalpost redefined
@rsalakhu
Russ Salakhutdinov
24 days
Predicting AGI/ASI timelines has become trendy, so I’ll offer mine: AGI/ASI is 5–10 years away. It always has been. It always will be.
0
0
2
@_akhaliq
AK
1 month
MMaDA-Parallel Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation
11
60
451
@maximkhv
Maxim Khomiakov
1 month
this time, it can also do the dishes
@OriolVinyalsML
Oriol Vinyals
1 month
🤔🤔🤔
0
0
0
@maximkhv
Maxim Khomiakov
1 month
Found a bug with this tool just recently, can recommend. The ability to drill down in your LLM traces in an easy fashion is quite useful. When you treat traces like training data, you may need to view it in a different way.
@thomasahle
Thomas Ahle
1 month
I ended up making it - trace taxi is up!
0
2
4
@kalomaze
kalomaze
1 month
> The way humans think look a lot more like diffusion than autoregressive. i will never, ever understand this claim or the intuitions behind it. ah yes. the human mind is... learning a scoring function to... reverse gaussian noise... (?) ... spatially (???)
@hyhieu226
Hieu Pham
1 month
Naive question, so please roast me. Why don't we have diffusion reasoning models? The way humans think look a lot more like diffusion than autoregressive.
57
9
515
@giffmana
Lucas Beyer (bl16)
2 months
> There’s no free lunch. > When you reduce the complexity of attention, you pay a price. > The question is, where? This is *exactly* how I typically end my Transformer tutorial. This slide is already 4 years old, I've never updated it, but it still holds:
@zpysky1125
Pengyu Zhao
2 months
MiniMax M2 Tech Blog 3: Why Did M2 End Up as a Full Attention Model? On behave of pre-training lead Haohai Sun. ( https://t.co/WH4xOD9KrT) I. Introduction As the lead of MiniMax-M2 pretrain, I've been getting many queries from the community on "Why did you turn back the clock
35
61
901
@jiqizhixin
机器之心 JIQIZHIXIN
2 months
Huge breakthrough from DeepMind! In their latest Nature paper, “Discovering state-of-the-art reinforcement learning algorithms,” they show that AI can autonomously discover better RL algorithms. "Enabling machines to discover learning algorithms for themselves is one of the
49
260
2K
@GladiaLab
GLADIA Research Lab
2 months
LLMs are injective and invertible. In our new paper, we show that different prompts always map to different embeddings, and this property can be used to recover input tokens from individual embeddings in latent space. (1/6)
283
1K
11K
@kwangmoo_yi
Kwang Moo Yi
2 months
Bai et al., "Positional Encoding Field" Make your RoPE encoding 3D by including a z axis, then manipulate your image by simply manipulating your positional encoding in 3D --> novel view synthesis. Neat idea.
7
55
480
@FarisSbahi
Faris Sbahi 🏴‍☠️
2 months
We’re hiring research engineers, software engineers, silicon engineers, and operators at @NormalComputing. Join us as we rethink ASICs with Physics and AI. We have offices and are hiring across NYC, LON, CPH, and SF. We are looking for folks with backgrounds across RL/agents,
4
4
16
@maximkhv
Maxim Khomiakov
2 months
prob one of the funniest LPs around. dealflow must be off the charts
@compliantvc
Henrick Johansson
2 months
I struggle to think of ONE exciting startup that's come out of the USA in the last 10 years All that country can produce is Ponzi schemes Meanwhile, Europe continues to lead the way in building safe, compliant companies that help people and the government
0
1
3
@giffmana
Lucas Beyer (bl16)
2 months
"What do 1M and 500K context windows have in common? They are both actually 64K."
@nrehiew_
wh
2 months
New post! This time, about the current state of Long Context Evaluation. I discuss existing benchmarks, what makes a good long context eval, what's missing from existing ones and introduce a new one - LongCodeEdit :)
31
65
1K
@pvncher
eric provencher
2 months
I’m really tired of labs pointing to swe bench as a sign that their new tiny model is SOTA. It’s just a bunch of python problems that are well leaked into the training set. Tiny models can rarely replace larger ones with any context pressure in practice.
@claudeai
Claude
2 months
Introducing Claude Haiku 4.5: our latest small model. Five months ago, Claude Sonnet 4 was state-of-the-art. Today, Haiku 4.5 matches its coding performance at one-third the cost and more than twice the speed.
20
14
271
@maximkhv
Maxim Khomiakov
2 months
1M tokens are not all equal, sparse attention etc.
@giffmana
Lucas Beyer (bl16)
2 months
So this is a first for me. I just had a pretty big refactoring session with codex-cli, and eventually it started going completely off the rails. It became very dumb, made bad mistakes, only followed half my instructions and made up the other half, misused tools in ways so stupid
0
0
0
@LouisKnightWebb
Louis Knight-Webb
2 months
Four devs, locked in a cage, vibecoding to the death #VibeOlympics @dexhorthy
1
2
32
@maximkhv
Maxim Khomiakov
3 months
Fairly surprising, 4096 was GPT 3.5 era limits, back in november 2022
@thomasahle
Thomas Ahle
3 months
4096 is hard-coded in many LLM libraries as the default output-seq-length. This causes weird bugs from modern LLMs that have much more to say
0
0
2