Nathan Axcan @AxcanNathan X Profile

Nathan Axcan

@AxcanNathan

Followers

130

Following

1K

Media

53

Statuses

839

Entropy-neur | Gradient descent is a photography of the real world. | Formerly RL&sequence modelling @tudelft, now LLM research@IBMResearchZurich

https://t.co/YVY6sJrftF

Zurich, Switzerland

Joined August 2022

Don't wanna be here? Send us removal request.

Nathan Axcan

@AxcanNathan

10 days

Look at that! I was able to find the GPT2 model I SFT'd on my trusty 3.5GB GTX 970 a few months before ChatGPT came out! I had downloaded translated Japanese light novels and used some regex trickery to extract conversations, which I turned into an SFT dataset to try to make the

1

0

Nathan Axcan

@AxcanNathan

9 days

Says! That’s the first cited claim in the announcement

0

Nathan Axcan

@AxcanNathan

10 days

Tiny, tiny.

0

Nathan Axcan

@AxcanNathan

10 days

The model was then served as a chatbot on a decently big Discord server (~1000 members), and I remember quite some fun interactions. It's crazy to now realize that it was a model of similar size to the ones @Dorialexander is training at @pleiasfr.

1

0

1

Nathan Axcan

@AxcanNathan

12 days

Lower body is 2.4x more expensive than the upper body? Maybe we go back to wheels at some point (39.1k vs 16.1k)

Zephyr

@zephyr_z9

12 days

Optimus Gen2 costs $50k-$60k This is a goldmine for me How did MS get this info??

0

1

Nathan Axcan

@AxcanNathan

14 days

Don’t forget

0

Nathan Axcan

@AxcanNathan

14 days

https://t.co/3r3GBpte3z

ARC Prize

@arcprize

3 months

New ARC Prize 2025 High Score 27.08% by Giotto. ai (@podesta_aldo)

1

0

Nathan Axcan

@AxcanNathan

15 days

...so what happened to @giotto_ai at @arcprize?

1

0

1

Nathan Axcan

@AxcanNathan

16 days

At what point do they make “inclusion in the next training run” a paid service? funny, it’s like paying to include a payload in the next Starship launch

Andrew Carr 🤸

@andrew_n_carr

17 days

Life hack. Closed models don't do what you want, but you don't want to fine tune your own? Make an eval, send it to these companies, tell them their competitors are dunking on them, watch the next generation completely saturate your task

0

Nathan Axcan

@AxcanNathan

1 month

Shall we start counting the inference costs as part of the losses?

Steve Hou

@stevehou

2 months

As I was saying. All the finance-illiterate techies jumped to congratulate DeepSeek for being trained into savant traders based on two days of live trading data. It turns out it was just trained to be a big degen trader, had its moment in the sun and then was promptly liquidated.

0

Nathan Axcan

@AxcanNathan

2 months

First website that comes to mind when you see this? For me @liliputingnews

0

Nathan Axcan

@AxcanNathan

2 months

Did you know that if you ask LLMs knowledge questions based on the Encyclopaedia Britannica you find a linear relationship between model size (regardless of MoE) and accuracy?

0

1

Nathan Axcan

@AxcanNathan

2 months

GPT-4.5 was secretly one of humanity’s great projects, hopefully it will come back after a few more GPU generations.

Epoch AI

@EpochAIResearch

2 months

New data insight: How does OpenAI allocate its compute? OpenAI spent ~$7 billion on compute last year. Most of this went to R&D, meaning all research, experiments, and training. Only a minority of this R&D compute went to the final training runs of released models.

0

1

Nathan Axcan

@AxcanNathan

3 months

Finally the VR killer app: Teleop

0

1

Nathan Axcan

@AxcanNathan

3 months

Rap lyrics probably have been and will keep moving towards style and content an LLM can’t produce (like real-time cross-social-platform trend references, private conversation references, unrecorded concert references) Not sure what the visual arts equivalent is, by now most of

0

1

Nathan Axcan

@AxcanNathan

3 months

Starting to think the purpose of money is to parallelise

0

Nathan Axcan

@AxcanNathan

3 months

Look at the last row, it's not even about being western, somehow it's about being anglo. I'd wager english fluency and GDP anti-correlate with AI anxiety.

0

1

Nathan Axcan

@AxcanNathan

3 months

also why didn't "AI index" sort this figure by sentiment? https://t.co/MjCaDZpevC

0

Nathan Axcan

@AxcanNathan

3 months

"LongCat LLM shows even food delivery companies in China are AI-pilled" Probably caused by Chinese positive sentiment towards AI. If USA/EU were so excited by AI, would Uber, ASML, SAP, Snap be training their own OSS MoEs?

1

3

Nathan Axcan

@AxcanNathan

3 months

I guess the strategy was “instead of increasing pay per researcher, increase GPUs per researcher as the main metric. Researchers broadly didn’t get into research for money.

Dustin Tran

@dustinvtran

3 months

I departed Google DeepMind after 8 years. So many fond memories—from early foundational papers in Google Brain (w/ @noamshazeer @ashvaswani @lukaszkaiser on Image Transformer, Tensor2Tensor, Mesh TensorFlow) to lead Gemini posttraining evals to catch up & launch in 100 days, then

0