tokenbender Profile Banner
tokenbender Profile
tokenbender

@tokenbender

Followers
11K
Following
45K
Media
2K
Statuses
13K

making models learn • eXperiments lab

Joined July 2014
Don't wanna be here? Send us removal request.
@tokenbender
tokenbender
4 months
is it possible to pretrain a language model using pure reinforcement learning from scratch? random weights, no cross-entropy loss pretraining. you may have many questions in your head.
63
160
2K
@tszzl
roon
2 days
K2 thinking is the most interesting open weights writing model so I hope people do fun things fine tuning it. games, interactive storytelling … I don’t see enough of this stuff
@thinkymachines
Thinking Machines
2 days
Tinker is now generally available. We also added support for advanced vision input models, Kimi K2 Thinking, and a simpler way to sample from models. https://t.co/nvaJHkGxc0
40
41
1K
@adithya_s_k
Adithya S K
2 days
Thank you for the wishes everyone, while these covered the highlights, also had a bunch of low points as well normalising failure though this post > published 5 research papers but could not present any of them offline because of visa issues > B1/B2 visa rejected twice while
@adithya_s_k
Adithya S K
3 days
Turned 22 today. AMA !! Quick look at the past year: > landed an ML internship at @Apple, then joined @MSFTResearch to work on agentic memory > secured a six figure USD research grant from @Meta to build SoTA AI models at @cognitivelab_ai > crossed 10k GitHub stars across my
10
6
220
@tokenbender
tokenbender
2 days
notebooklm has gotten really useful now.
1
1
15
@tokenbender
tokenbender
2 days
my mind when i see a mutual consistently erode my sanity > muting isn’t enough > it’s over
@tszzl
roon
2 days
the thing about me is that i like having twitter mutuals more than i hate bad opinions
2
0
19
@tokenbender
tokenbender
3 days
arc-agi-1 is not the reference that it used to be, especially after contamination.
@arcprize
ARC Prize
3 days
A year ago, we verified a preview of an unreleased version of @OpenAI o3 (High) that scored 88% on ARC-AGI-1 at est. $4.5k/task Today, we’ve verified a new GPT-5.2 Pro (X-High) SOTA score of 90.5% at $11.64/task This represents a ~390X efficiency improvement in one year
6
4
69
@tokenbender
tokenbender
3 days
It is now fashionable to aura farm by releasing a really strong minor version update. GPT 5.2 looks really good though.
@OpenAI
OpenAI
3 days
GPT-5.2 Thinking evals
1
3
60
@kohjingyu
Jing Yu Koh
4 days
I've observed 3 types of ways that great AI researchers work: 1) Working on whatever they find interesting, even if it's "useless" Whether something will be publishable, fundable, or obviously impactful, is irrelevant to what these people work on. They simply choose something
26
89
900
@rogershijin
Roger Jin
5 days
I was the lead for this project and wanna add some caveats: 1) this model has looping/gibberish and formatting problems - working on it! 2) the 87/120 = rank #2 last year thing is true but according to our annotators, this year's contest felt easier than last year's so we
@NousResearch
Nous Research
5 days
Today we open source Nomos 1. At just 30B parameters, it scores 87/120 on this year’s Putnam, one of the world’s most prestigious math competitions. This score would rank #2/3988 in 2024 and marks our first step with @hillclimbai towards creating a SOTA AI mathematician.
10
9
178
@BlackHC
Andreas Kirsch 🇺🇦
4 days
This framing by OP is actually an argument for automating science and terrible argument against it If you delay automating science and curing cancer because you want to provide folks with the ego trip of getting credit assignment or doing performative work, you are clearly
@togelius
Julian Togelius
6 days
I was at an event on AI for science yesterday, a panel discussion here at NeurIPS. The panelists discussed how they plan to replace humans at all levels in the scientific process. So I stood up and protested that what they are doing is evil. Look around you, I said. The room is
8
7
175
@NousResearch
Nous Research
5 days
Today we open source Nomos 1. At just 30B parameters, it scores 87/120 on this year’s Putnam, one of the world’s most prestigious math competitions. This score would rank #2/3988 in 2024 and marks our first step with @hillclimbai towards creating a SOTA AI mathematician.
80
229
2K
@tokenbender
tokenbender
4 days
this is an interesting paper and one is recommended to read it. though most of the information present here is something people with lots of RL-hours would know by experience. but it is nicely written and i enjoyed reading it.
0
0
6
@tokenbender
tokenbender
4 days
they keep on saying do not anthropomorphise the models, but all good techniques seem to be analogies of what we have studied about humans, behaviour and learning all this time. "RL works best when tasks are at the edge of competence"
@xiangyue96
Xiang Yue
5 days
There are competing views on whether RL can genuinely improve base model's performance (e.g., pass@128). The answer is both yes and no, largely depending on the interplay between pre-training, mid-training, and RL. We trained a few hundreds of GPT-2 scale LMs on synthetic
3
2
31
@himanshustwts
himanshu
4 days
Tiny Models will Run the World ⚡️ Upcoming an amazing podcast with @vikhyatk, co-founder of moondream. vik is building open frontier vision ai with incredible sota. Drop your questions for vik around vision, data, open source, architectures, moondream + everything ai lores.
21
13
356
@tokenbender
tokenbender
4 days
it is not that ideas are cheap. it is just that you're coming up with ideas that are reducible to the existing ones in a manner obvious to the experts.
3
0
19
@kfountou
Kimon Fountoulakis
5 days
I do this. I don't do projects unless I have something major to learn from them. I seek a large gap of novelty. If there are more than 2-3 papers on a particular problem, and I don't have a unique angle on it, I don't work on the project. This means: 1. There is little
@anshulkundaje
Anshul Kundaje
9 days
Repeat after me. FUCK citation counts. Just do meaningful work that you enjoy, do a great job making it accessible and present it honestly so folks can use it and build on it optimally. That's it.
3
8
115
@tokenbender
tokenbender
4 days
> GPT-5 was asked for a test that detects nonlinear theories. It provided a test that detects nonlocal ones. this is why we would continue needing tight integration loops with experts for the time being. once again "inhuman" mistakes. a human capable of operating at this level
@postquantum
Jonathan Oppenheim
5 days
OpenAI leadership (@gdb, @markchen90) are promoting a paper in Physics Letters B where GPT-5 proposed the main idea — possibly the first peer-reviewed paper where an LLM generated the core contribution. One small problem: GPT-5's idea tests the wrong thing. 1/
2
2
15
@GuillaumeLample
Guillaume Lample @ NeurIPS 2024
5 days
Very excited to release two new open-weight models, Devstral 2 (123B) and Devstral Small 2 (24B), along with Mistral Vibe, a CLI built for Devstral that enables end-to-end code automation!
@MistralAI
Mistral AI
5 days
Introducing the Devstral 2 coding model family. Two sizes, both open source. Also, meet Mistral Vibe, a native CLI, enabling end-to-end automation. 🧵
32
75
955