rosstaylor90 Profile Banner
Ross Taylor Profile
Ross Taylor

@rosstaylor90

Followers
9K
Following
12K
Media
54
Statuses
2K

Universal intelligence at @GenReasoning. Previously lots of other things like: Llama 3/2, Galactica, Papers with Code.

▽²f = 0
Joined March 2012
Don't wanna be here? Send us removal request.
@rosstaylor90
Ross Taylor
5 hours
What seems like an exponential in AI is just a series of S curves. Each era rides on a wave of increasing compute but finds a new way to utilise it - overcoming limitations of the previous stage. Eg pre-training was the dominant way to utilise compute, but the limitations of.
0
2
13
@rosstaylor90
Ross Taylor
4 days
It’s funny that people on this site think major LLM efforts are talent-bound rather than org-bound. The talent differential has never been big between major orgs. Most of the difference in outcomes is due to organisational factors - like allocating compute to the right bets, and.
25
26
467
@rosstaylor90
Ross Taylor
9 days
Nice work on prediction vs understanding.
@keyonV
Keyon Vafa
9 days
Can an AI model predict perfectly and still have a terrible world model?. What would that even mean?. Our new ICML paper formalizes these questions. One result tells the story: A transformer trained on 10M solar systems nails planetary orbits. But it botches gravitational laws 🧵
0
1
13
@rosstaylor90
Ross Taylor
25 days
Congrats @n_latysheva !.
@GoogleDeepMind
Google DeepMind
25 days
Introducing AlphaGenome: an AI model to help scientists better understand our DNA – the instruction manual for life 🧬. Researchers can now quickly predict what impact genetic changes could have - helping to generate new hypotheses and drive biological discoveries. ↓
0
0
6
@rosstaylor90
Ross Taylor
1 month
If you take ASI seriously, then you care about where you want to build it and who you want to build it for.
2
2
25
@rosstaylor90
Ross Taylor
1 month
RT @natolambert: Too many are being sanctimonious about human intelligence in face of the first real thinking machines. They'll be left beh….
0
25
0
@rosstaylor90
Ross Taylor
2 months
Finally proof that a British accent makes you smarter.
@maxxrubin_
Max
2 months
Definitely weird stuff with "o1 pro".-Says it's o3.-It has access to memory tool.-Can search the web.It says "optimised" not "optimized" (Only o3 slips into British English). Feels like o3 pro.@apples_jimmy @chatgpt21 @btibor91 @iruletheworldmo @scaling01 @kimmonismus @chetaslua
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
0
14
@rosstaylor90
Ross Taylor
2 months
Reinforcement learning is everywhere 🍒
Tweet media one
0
0
14
@rosstaylor90
Ross Taylor
2 months
The best way to judge new results in ML is how much complexity they introduce for their stated performance gain. Most new things get small improvements for large complexity gains. They trade on novelty bias in the short term, and nerd-snipe people into thinking their approach is.
3
7
91
@rosstaylor90
Ross Taylor
2 months
This is a nice thread by @MinqiJiang.
@MinqiJiang
Minqi Jiang
2 months
It's so fun to see RL finally work on complex real-world tasks with LLM policies, but it's increasingly clear that we lack an understanding of how RL fine-tuning leads to generalization. In the same week, we got two (awesome) papers:. Absolute Zero Reasoner: Improvements on code
Tweet media one
0
0
14
@rosstaylor90
Ross Taylor
2 months
These early experiments really influenced my views on how far neural networks can go. As we pack more and more types of sequences into these models, we are going to find bizarre, wonderful connections between different things in nature. In fact, these novel connections likely.
1
0
15
@rosstaylor90
Ross Taylor
2 months
Another was going from sequences like amino acids to ground truth descriptions. Again this seemed to work really well. This was purely from just unsupervised learning on different scientific sequences. Supports the idea that in compressing the data, the model found a common
Tweet media one
1
0
8
@rosstaylor90
Ross Taylor
2 months
One cool thing we found at the time is that the model found a “Platonic representation” of chemical compounds. If you did a task like SMILES to IUPAC and looked at attention, it would attend to the correct part of the chemical graph when eg generating something like “amino”.
Tweet media one
1
1
15
@rosstaylor90
Ross Taylor
2 months
With Galactica, I was quite influenced by Solomonoff’s idea of induction as a sequence prediction - “everything is a sequence”. So we built this special tokenizer logic to get all these different types of scientific sequence into the same model. Modelling all sequences with
Tweet media one
4
7
84
@rosstaylor90
Ross Taylor
2 months
RL is very expensive compared to SFT, which makes it impractical to scale for most folks outside of big labs. And yet, RL is perfect for businesses because you can optimise the metric you actually care about. Not the next token; but the next sale or the next customer. Already.
11
9
269
@rosstaylor90
Ross Taylor
3 months
When your model has emergent swearing in its internal monologue.
Tweet media one
1
1
28
@rosstaylor90
Ross Taylor
3 months
Neural networks were once in the “graveyard of ideas” because the conditions weren’t right for them to shine (data, hardware). So maybe it’s a waiting room rather than a graveyard 🙂. I’m not sure a lot of the ideas below are dead actually - eg SSM-transformer hybrids look more.
@tokenbender
tokenbender
3 months
Making a list of graveyard of ideas, the ultimate nerd snipes where efforts go and die. DPO-*variant.SSM-transformer hybrids.SAEs.MCTS.Diffusion for large vision models.Attention-less.JEPA (lecun lovers) . what else?.
3
11
178
@rosstaylor90
Ross Taylor
3 months
Happy Qwen day to all who celebrate.
@Alibaba_Qwen
Qwen
3 months
Introducing Qwen3! . We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general
Tweet media one
Tweet media two
2
2
50
@rosstaylor90
Ross Taylor
3 months
All that is old is new again.
Tweet media one
@willccbb
will brown
3 months
how sure are we that one epoch is optimal for pretraining in the data-scarce regime.
2
1
65