AxcanNathan Profile Banner
Nathan Axcan Profile
Nathan Axcan

@AxcanNathan

Followers
130
Following
1K
Media
53
Statuses
839

Entropy-neur | Gradient descent is a photography of the real world. | Formerly RL&sequence modelling @tudelft, now LLM research@IBMResearchZurich

Zurich, Switzerland
Joined August 2022
Don't wanna be here? Send us removal request.
@AxcanNathan
Nathan Axcan
10 days
Look at that! I was able to find the GPT2 model I SFT'd on my trusty 3.5GB GTX 970 a few months before ChatGPT came out! I had downloaded translated Japanese light novels and used some regex trickery to extract conversations, which I turned into an SFT dataset to try to make the
1
0
0
@AxcanNathan
Nathan Axcan
9 days
Says! That’s the first cited claim in the announcement
0
0
0
@AxcanNathan
Nathan Axcan
10 days
Tiny, tiny.
0
0
0
@AxcanNathan
Nathan Axcan
10 days
The model was then served as a chatbot on a decently big Discord server (~1000 members), and I remember quite some fun interactions. It's crazy to now realize that it was a model of similar size to the ones @Dorialexander is training at @pleiasfr.
1
0
1
@AxcanNathan
Nathan Axcan
12 days
Lower body is 2.4x more expensive than the upper body? Maybe we go back to wheels at some point (39.1k vs 16.1k)
@zephyr_z9
Zephyr
12 days
Optimus Gen2 costs $50k-$60k This is a goldmine for me How did MS get this info??
0
0
1
@AxcanNathan
Nathan Axcan
14 days
Don’t forget
0
0
0
@AxcanNathan
Nathan Axcan
14 days
@arcprize
ARC Prize
3 months
New ARC Prize 2025 High Score 27.08% by Giotto. ai (@podesta_aldo)
1
0
0
@AxcanNathan
Nathan Axcan
15 days
...so what happened to @giotto_ai at @arcprize?
1
0
1
@AxcanNathan
Nathan Axcan
16 days
At what point do they make “inclusion in the next training run” a paid service? funny, it’s like paying to include a payload in the next Starship launch
@andrew_n_carr
Andrew Carr 🤸
17 days
Life hack. Closed models don't do what you want, but you don't want to fine tune your own? Make an eval, send it to these companies, tell them their competitors are dunking on them, watch the next generation completely saturate your task
0
0
0
@AxcanNathan
Nathan Axcan
1 month
Shall we start counting the inference costs as part of the losses?
@stevehou
Steve Hou
2 months
As I was saying. All the finance-illiterate techies jumped to congratulate DeepSeek for being trained into savant traders based on two days of live trading data. It turns out it was just trained to be a big degen trader, had its moment in the sun and then was promptly liquidated.
0
0
0
@AxcanNathan
Nathan Axcan
2 months
First website that comes to mind when you see this? For me @liliputingnews
0
0
0
@AxcanNathan
Nathan Axcan
2 months
Did you know that if you ask LLMs knowledge questions based on the Encyclopaedia Britannica you find a linear relationship between model size (regardless of MoE) and accuracy?
0
0
1
@AxcanNathan
Nathan Axcan
2 months
GPT-4.5 was secretly one of humanity’s great projects, hopefully it will come back after a few more GPU generations.
@EpochAIResearch
Epoch AI
2 months
New data insight: How does OpenAI allocate its compute? OpenAI spent ~$7 billion on compute last year. Most of this went to R&D, meaning all research, experiments, and training. Only a minority of this R&D compute went to the final training runs of released models.
0
0
1
@AxcanNathan
Nathan Axcan
3 months
Finally the VR killer app: Teleop
0
0
1
@AxcanNathan
Nathan Axcan
3 months
Rap lyrics probably have been and will keep moving towards style and content an LLM can’t produce (like real-time cross-social-platform trend references, private conversation references, unrecorded concert references) Not sure what the visual arts equivalent is, by now most of
0
0
1
@AxcanNathan
Nathan Axcan
3 months
Starting to think the purpose of money is to parallelise
0
0
0
@AxcanNathan
Nathan Axcan
3 months
Look at the last row, it's not even about being western, somehow it's about being anglo. I'd wager english fluency and GDP anti-correlate with AI anxiety.
0
0
1
@AxcanNathan
Nathan Axcan
3 months
also why didn't "AI index" sort this figure by sentiment? https://t.co/MjCaDZpevC
0
0
0
@AxcanNathan
Nathan Axcan
3 months
"LongCat LLM shows even food delivery companies in China are AI-pilled" Probably caused by Chinese positive sentiment towards AI. If USA/EU were so excited by AI, would Uber, ASML, SAP, Snap be training their own OSS MoEs?
1
1
3
@AxcanNathan
Nathan Axcan
3 months
I guess the strategy was “instead of increasing pay per researcher, increase GPUs per researcher as the main metric. Researchers broadly didn’t get into research for money.
@dustinvtran
Dustin Tran
3 months
I departed Google DeepMind after 8 years. So many fond memories—from early foundational papers in Google Brain (w/ @noamshazeer @ashvaswani @lukaszkaiser on Image Transformer, Tensor2Tensor, Mesh TensorFlow) to lead Gemini posttraining evals to catch up & launch in 100 days, then
0
0
0