nathan lile @NathanThinks X Profile

nathan lile

@NathanThinks

Followers

2K

Following

29K

Media

249

Statuses

2K

ceo/cofounder @ https://t.co/bDd3J4Lmzf hiring in SF 🌁 scaling synthetic reasoning. recurrent rabbit hole victim. nothing great is easy.

San Francisco

Joined August 2013

Don't wanna be here? Send us removal request.

nathan lile

@NathanThinks

10 months

Superintelligence isn't about discovering new things; it's about discovering new ways to discover I think our latest work formalizes Meta Chain-of-Thought which we believe lies on the path to ASI When we train models on the problem-solving process itself—rather than the final

Rafael Rafailov

@rm_rafailov

10 months

We have a new position paper on "inference time compute" and what we have been working on in the last few months! We present some theory on why it is necessary, how does it work, why we need it and what does it mean for "super" intelligence.

5

30

139

nathan lile

@NathanThinks

1 month

we’re at #COLM2025🍁 come see our poster # 26 (session 1) today reach out ✉️ if you'd like to chat!

nathan lile

@NathanThinks

8 months

Qwen+RL = dramatic, Aha! Llama+RL = quick plateau Same size. Same RL. Why? Qwen naturally exhibits cognitive behaviors that Llama doesn't Prime Llama with 4 synthetic reasoning patterns & it matched Qwen's self-improvement performance! We can engineer this into any model! 👇

1

7

47

Anikait Singh

@Anikait_Singh_

1 month

🚨🚨New Paper: Training LLMs to Discover Abstractions for Solving Reasoning Problems Introducing RLAD, a two-player RL framework for LLMs to discover 'reasoning abstractions'—natural language hints that encode procedural knowledge for structured exploration in reasoning.🧵⬇️

14

118

591

Zichen Liu

@zzlccc

1 month

much more convinced after getting my own results: LoRA with rank=1 learns (and generalizes) as well as full-tuning while saving 43% vRAM usage! allows me to RL bigger models with limited resources😆 script: https://t.co/p6IIiBQA6c

Thinking Machines

@thinkymachines

1 month

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.

8

95

799

Rafael Rafailov

@rm_rafailov

2 months

I’ll take the opposite view - current methods are saturating and we need at least 1 practical breakthrough and at least two fundamental ones (which will likely take years) just off the top of my head to reach AGI. None of these are oversight or safety related.

Stephen McAleer

@McaleerStephen

2 months

Scalable oversight is pretty much the last big research problem left. Once you get an unhackable reward function for anything then you can RL on everything.

11

157

John Burn-Murdoch

@jburnmurdoch

3 months

NEW: Is the internet changing our personalities for the worse? Conscientiousness and extroversion are down, neuroticism up, with young adults leading the charge. This is a really consequential shift, and there’s a lot going on here, so let’s get into the weeds 🧵

409

3K

12K

Seth Kimmel

@sethkimmel3

4 months

Really great collaborating with @NathanThinks! Reach out if you're working on synthetic data generation, offline RL, or simulating agentic behavior.

Sutro

@sutro_sh

4 months

Earlier this year we partnered with SynthLabs ( https://t.co/Y0R2ObW4wT), a post-training research lab, to generate a 351 billion token synthetic dataset 10x faster and 80% cheaper. Read more in our case study:

0

3

12

nathan lile

@NathanThinks

4 months

up _and_ left 😲

1

4

19

roon

@tszzl

4 months

you have no idea how hard it is to get an rlhf model to be even “centrist” much less right reactionary. they must have beat this guy up pretty hard

Daniel

@growing_daniel

4 months

blocked it because of this. No hate on the timeline please!

172

166

4K

Sam Altman

@sama

4 months

I’m not big on identities, but I am extremely proud to be American. This is true every day, but especially today—I firmly believe this is the greatest country ever on Earth. The American miracle stands alone in world history. I believe in techno-capitalism. We should encourage

3K

2K

33K

Vaibhav (VB) Srivastav

@reach_vb

4 months

Apple dropping diffusion based Coding LLMs on Hugging Face was not on my bingo

18

84

862

Fred Lambert

@FredericLambert

4 months

Xiaomi got 200,000 orders in 3 minutes for the YU7 and I’m not even surprised. The value proposition is just nuts. I’m kinda of bummed because it means a few more years of having to satisfy demand from China before global expansions.

53

390

TractoAI

@tractoai

5 months

the future is about smart tokens

nathan lile

@NathanThinks

5 months

What if models could learn which problems _deserve_ deep thinking? No labels. Just let the model discover difficulty through its own performance during training. Instead of burning compute 🔥💸 on trivial problems, it allocates 5x more on problems that actually need it ↓

0

3

7

nathan lile

@NathanThinks

5 months

What if models could learn which problems _deserve_ deep thinking? No labels. Just let the model discover difficulty through its own performance during training. Instead of burning compute 🔥💸 on trivial problems, it allocates 5x more on problems that actually need it ↓

SynthLabs

@synth_labs

5 months

Our new method (ALP) monitors solve rates across RL rollouts and applies inverse difficulty penalties during RL training. Result? Models learn an implicit difficulty estimator—allocating 5x more tokens to hard vs easy problems, cutting overall usage by 50% 🧵👇1/10

1

7

38

Jesse Peltan

@JessePeltan

6 months

China is winning the race to Type 1 Civilization and we're not even aware it's happening. By 2030, China will have the manufacturing capacity to build an entire U.S. worth of generation from solar and storage alone - every single year. The flow of energy is what drives physical

DER Task Force

@DER_Task_Force

6 months

Check out our latest pod with @JessePeltan, which is just 3 hrs straight of him dropping bangers like the one below

326

853

4K

Ashish Vaswani

@ashVaswani

5 months

Check out our latest research on data. We're releasing 24T tokens of richly labelled web data. We found it very useful for our internal data curation efforts. Excited to see what you build using Essential-Web v1.0!

Essential AI

@essential_ai

5 months

[1/5] 🚀 Meet Essential-Web v1.0, a 24-trillion-token pre-training dataset with rich metadata built to effortlessly curate high-performing datasets across domains and use cases!

23

86

646

James Alcorn

@JamesAlcorn94

5 months

congrats @rm_rafailov on your hard-earned acceptance to the USofA as alien of officially extraordinary ability. The alien piece comes as no surprise to your mates of course, but at least the general public now has fair warning and a fighting chance. To celebrate with a fitting

3

2

40

Rafael Rafailov

@rm_rafailov

5 months

When we first published our work on this 9 months ago it was rejected for being impractical in realistic cases. Six months later it was rejected for lack of novelty. It’s the way academic publishing goes.

Nathan Lambert

@natolambert

5 months

Another generative / inference-time scaling reward modeling paper. It's the direction things are going.

4

16

152

nathan lile

@NathanThinks

5 months

https://t.co/ZcHZdX6cAt

nathan lile

@NathanThinks

6 months

btw we have ongoing research on this front! we're open-science, pro-publication, and love collaboration. want to push this frontier forward? we're growing our SF team & always open to research partners—reach out, my DMs are open 📩

0

1

nathan lile

@NathanThinks

5 months

Generative Reward Models impact compounds daily. way stronger interest now than when we published last fall 👇 many excellent recent extensions—cool seeing where researchers take GenRM

nathan lile

@NathanThinks

1 year

we bootstrapped our way to generalized meta-reasoning capabilities with generative reward models classical reward models can be worse than random on new reasoning tasks 🎲 we see improvements in robustness, generalization, interpretability and an opportunity to unify RLHF/RLAIF

1

3

19

nathan fielder

@nathanfielder

5 months

I was going to call this dumb, but former NTSB board member John Goglia just texted me and told me to reply with this instead: The issue raised in The Rehearsal is whether the authority gradient affects copilots' willingness to assert themselves at critical junctures and

Committee on Transportation and Infrastructure

@TransportDems

5 months

Nathan Fielder’s question has been asked and answered!

197

2K

33K