tokenbender @tokenbender X Profile

tokenbender

@tokenbender

Followers

11K

Following

45K

Media

2K

Statuses

13K

making models learn • eXperiments lab

https://t.co/dymwgpGbM8

Joined July 2014

Don't wanna be here? Send us removal request.

tokenbender

@tokenbender

4 months

is it possible to pretrain a language model using pure reinforcement learning from scratch? random weights, no cross-entropy loss pretraining. you may have many questions in your head.

63

160

2K

roon

@tszzl

2 days

K2 thinking is the most interesting open weights writing model so I hope people do fun things fine tuning it. games, interactive storytelling … I don’t see enough of this stuff

Thinking Machines

@thinkymachines

2 days

Tinker is now generally available. We also added support for advanced vision input models, Kimi K2 Thinking, and a simpler way to sample from models. https://t.co/nvaJHkGxc0

40

41

1K

Adithya S K

@adithya_s_k

2 days

Thank you for the wishes everyone, while these covered the highlights, also had a bunch of low points as well normalising failure though this post > published 5 research papers but could not present any of them offline because of visa issues > B1/B2 visa rejected twice while

Adithya S K

@adithya_s_k

3 days

Turned 22 today. AMA !! Quick look at the past year: > landed an ML internship at @Apple, then joined @MSFTResearch to work on agentic memory > secured a six figure USD research grant from @Meta to build SoTA AI models at @cognitivelab_ai > crossed 10k GitHub stars across my

10

6

220

tokenbender

@tokenbender

2 days

notebooklm has gotten really useful now.

1

15

tokenbender

@tokenbender

2 days

my mind when i see a mutual consistently erode my sanity > muting isn’t enough > it’s over

roon

@tszzl

2 days

the thing about me is that i like having twitter mutuals more than i hate bad opinions

2

0

19

tokenbender

@tokenbender

3 days

arc-agi-1 is not the reference that it used to be, especially after contamination.

ARC Prize

@arcprize

3 days

A year ago, we verified a preview of an unreleased version of @OpenAI o3 (High) that scored 88% on ARC-AGI-1 at est. $4.5k/task Today, we’ve verified a new GPT-5.2 Pro (X-High) SOTA score of 90.5% at $11.64/task This represents a ~390X efficiency improvement in one year

6

4

69

tokenbender

@tokenbender

3 days

It is now fashionable to aura farm by releasing a really strong minor version update. GPT 5.2 looks really good though.

OpenAI

@OpenAI

3 days

GPT-5.2 Thinking evals

1

3

60

Jing Yu Koh

@kohjingyu

4 days

I've observed 3 types of ways that great AI researchers work: 1) Working on whatever they find interesting, even if it's "useless" Whether something will be publishable, fundable, or obviously impactful, is irrelevant to what these people work on. They simply choose something

26

89

900

Roger Jin

@rogershijin

5 days

I was the lead for this project and wanna add some caveats: 1) this model has looping/gibberish and formatting problems - working on it! 2) the 87/120 = rank #2 last year thing is true but according to our annotators, this year's contest felt easier than last year's so we

Nous Research

@NousResearch

5 days

Today we open source Nomos 1. At just 30B parameters, it scores 87/120 on this year’s Putnam, one of the world’s most prestigious math competitions. This score would rank #2/3988 in 2024 and marks our first step with @hillclimbai towards creating a SOTA AI mathematician.

10

9

178

Andreas Kirsch 🇺🇦

@BlackHC

4 days

This framing by OP is actually an argument for automating science and terrible argument against it If you delay automating science and curing cancer because you want to provide folks with the ego trip of getting credit assignment or doing performative work, you are clearly

Julian Togelius

@togelius

6 days

I was at an event on AI for science yesterday, a panel discussion here at NeurIPS. The panelists discussed how they plan to replace humans at all levels in the scientific process. So I stood up and protested that what they are doing is evil. Look around you, I said. The room is

8

7

175

Nous Research

@NousResearch

5 days

Today we open source Nomos 1. At just 30B parameters, it scores 87/120 on this year’s Putnam, one of the world’s most prestigious math competitions. This score would rank #2/3988 in 2024 and marks our first step with @hillclimbai towards creating a SOTA AI mathematician.

80

229

2K

tokenbender

@tokenbender

4 days

this is an interesting paper and one is recommended to read it. though most of the information present here is something people with lots of RL-hours would know by experience. but it is nicely written and i enjoyed reading it.

0

6

tokenbender

@tokenbender

4 days

they keep on saying do not anthropomorphise the models, but all good techniques seem to be analogies of what we have studied about humans, behaviour and learning all this time. "RL works best when tasks are at the edge of competence"

Xiang Yue

@xiangyue96

5 days

There are competing views on whether RL can genuinely improve base model's performance (e.g., pass@128). The answer is both yes and no, largely depending on the interplay between pre-training, mid-training, and RL. We trained a few hundreds of GPT-2 scale LMs on synthetic

3

2

31

himanshu

@himanshustwts

4 days

Tiny Models will Run the World ⚡️ Upcoming an amazing podcast with @vikhyatk, co-founder of moondream. vik is building open frontier vision ai with incredible sota. Drop your questions for vik around vision, data, open source, architectures, moondream + everything ai lores.

21

13

356

tokenbender

@tokenbender

4 days

it is not that ideas are cheap. it is just that you're coming up with ideas that are reducible to the existing ones in a manner obvious to the experts.

3

0

19

Kimon Fountoulakis

@kfountou

5 days

I do this. I don't do projects unless I have something major to learn from them. I seek a large gap of novelty. If there are more than 2-3 papers on a particular problem, and I don't have a unique angle on it, I don't work on the project. This means: 1. There is little

Anshul Kundaje

@anshulkundaje

9 days

Repeat after me. FUCK citation counts. Just do meaningful work that you enjoy, do a great job making it accessible and present it honestly so folks can use it and build on it optimally. That's it.

3

8

115

tokenbender

@tokenbender

4 days

> GPT-5 was asked for a test that detects nonlinear theories. It provided a test that detects nonlocal ones. this is why we would continue needing tight integration loops with experts for the time being. once again "inhuman" mistakes. a human capable of operating at this level

Jonathan Oppenheim

@postquantum

5 days

OpenAI leadership (@gdb, @markchen90) are promoting a paper in Physics Letters B where GPT-5 proposed the main idea — possibly the first peer-reviewed paper where an LLM generated the core contribution. One small problem: GPT-5's idea tests the wrong thing. 1/

2

15

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

5 days

Very excited to release two new open-weight models, Devstral 2 (123B) and Devstral Small 2 (24B), along with Mistral Vibe, a CLI built for Devstral that enables end-to-end code automation!

Mistral AI

@MistralAI

5 days

Introducing the Devstral 2 coding model family. Two sizes, both open source. Also, meet Mistral Vibe, a native CLI, enabling end-to-end automation. 🧵

32

75

955