Nathan Axcan
@AxcanNathan
Followers
130
Following
1K
Media
53
Statuses
839
Entropy-neur | Gradient descent is a photography of the real world. | Formerly RL&sequence modelling @tudelft, now LLM research@IBMResearchZurich
Zurich, Switzerland
Joined August 2022
Look at that! I was able to find the GPT2 model I SFT'd on my trusty 3.5GB GTX 970 a few months before ChatGPT came out! I had downloaded translated Japanese light novels and used some regex trickery to extract conversations, which I turned into an SFT dataset to try to make the
1
0
0
The model was then served as a chatbot on a decently big Discord server (~1000 members), and I remember quite some fun interactions. It's crazy to now realize that it was a model of similar size to the ones @Dorialexander is training at @pleiasfr.
1
0
1
1
0
0
At what point do they make “inclusion in the next training run” a paid service? funny, it’s like paying to include a payload in the next Starship launch
Life hack. Closed models don't do what you want, but you don't want to fine tune your own? Make an eval, send it to these companies, tell them their competitors are dunking on them, watch the next generation completely saturate your task
0
0
0
Shall we start counting the inference costs as part of the losses?
As I was saying. All the finance-illiterate techies jumped to congratulate DeepSeek for being trained into savant traders based on two days of live trading data. It turns out it was just trained to be a big degen trader, had its moment in the sun and then was promptly liquidated.
0
0
0
Did you know that if you ask LLMs knowledge questions based on the Encyclopaedia Britannica you find a linear relationship between model size (regardless of MoE) and accuracy?
0
0
1
GPT-4.5 was secretly one of humanity’s great projects, hopefully it will come back after a few more GPU generations.
New data insight: How does OpenAI allocate its compute? OpenAI spent ~$7 billion on compute last year. Most of this went to R&D, meaning all research, experiments, and training. Only a minority of this R&D compute went to the final training runs of released models.
0
0
1
Rap lyrics probably have been and will keep moving towards style and content an LLM can’t produce (like real-time cross-social-platform trend references, private conversation references, unrecorded concert references) Not sure what the visual arts equivalent is, by now most of
0
0
1
Look at the last row, it's not even about being western, somehow it's about being anglo. I'd wager english fluency and GDP anti-correlate with AI anxiety.
0
0
1
also why didn't "AI index" sort this figure by sentiment? https://t.co/MjCaDZpevC
0
0
0
"LongCat LLM shows even food delivery companies in China are AI-pilled" Probably caused by Chinese positive sentiment towards AI. If USA/EU were so excited by AI, would Uber, ASML, SAP, Snap be training their own OSS MoEs?
1
1
3
I guess the strategy was “instead of increasing pay per researcher, increase GPUs per researcher as the main metric. Researchers broadly didn’t get into research for money.
I departed Google DeepMind after 8 years. So many fond memories—from early foundational papers in Google Brain (w/ @noamshazeer @ashvaswani @lukaszkaiser on Image Transformer, Tensor2Tensor, Mesh TensorFlow) to lead Gemini posttraining evals to catch up & launch in 100 days, then
0
0
0