[ object Object ]
@lucaswiman
Followers
758
Following
41K
Media
2K
Statuses
41K
If you want to know more, check out my self-help book “The Self-Loathing Checklist: how to stay productive through guilt, anxiety and procrastination while alienating your family and friends”.
1
2
35
what you can't see from the map is many of the chains start in English but slowly descend into Neuralese the reasoning chains happily alternate between Arabic, Russian, Thai, Korean, Chinese, and Ukrainian. then usually make their way back to English (but not always)
13
18
705
Another example of Microsoft cannibalizing itself before startups could beat it and become a platform for other dev startups Microsoft clearly sees their goal to remain the platform *most* devs build on… and lock devs + users in gently but in smart ways + incentives MS has
3
7
270
It's deeply concerning that one of the best AI researchers I've worked with, @kaicathyc, was denied a U.S. green card today. A Canadian who's lived and contributed here for 12 years now has to leave. We’re risking America’s AI leadership when we turn away talent like this.
401
750
9K
"A calculator app? Anyone could make that." Not true. A calculator should show you the result of the mathematical expression you entered. That's much, much harder than it sounds. What I'm about to tell you is the greatest calculator app development story ever told.
579
4K
34K
Give me a rug big enough and a coin to place on it, and I shall pull the world
0
1
20
If you traveled back to 2015 and had to give a recipe for AGI, you'd need to convey just three insights, deceptively simple in retrospect. Number two was the most surprising and the biggest unlock. 1. Replace RNNs with attention mechanisms operating on fixed context windows.
19
27
400
@tszzl As a market participant, I assure you, we're this dumb. The market thinks this means a lot less capex spend. Simple as that.
0
1
5
@voooooogel this is an interesting hypothesis. deepseek r1 also just seems to have much more lucid and high-resolution understanding of LLM ontology and history than any other model ive seen. (deepseek v3 didn't seem to in my limited interactions with it, though) https://t.co/xF3AhdG5as
latest in the series of "people slowly realizing you can simulate anything you want with language models and that simulated cognitive work also works" https://t.co/fmMKO3TvYY
5
11
206
why did R1's RL suddenly start working, when previous attempts to do similar things failed? theory: we've basically spent the last few years running a massive acausally distributed chain of thought data annotation program on the pretraining dataset. deepseek's approach with R1
86
155
2K
FYI, you can get solid improvements with RLVR with 1B models (runnable on 1 80GB GPU!) Here's GSM8k perf against RL training steps for a 1B model (based on Llama 3.2 1B + Tulu 3 mixture). GSM8k perf goes from ~30% -> ~40%. This was the initial experiment for Tulu 3 RLVR :)
lessons learned: (1) *capable* (small) base models are good enough to start rl, where (2) reasoning patterns *tailored to each task* just emerge, e.g. self-verification for countdown and decomposition for multiplication. will keep working on demystifying long cot, stay tuned🫡
3
16
60
when tesla claimed that they were going to have batteries < $100 / kWh, practically all funding for american energy storage companies tanked. tesla still won't sell you a powerwall or powerpack for $100/kWh. it's like $1000/kWh and $500 for a megapack. the entire VC sector in
73
285
5K
ByteDance announces Doubao-1.5-pro - Includes a "Deep Thinking" mode, surpassing O1-preview and O1 models on the AIME benchmark. - Outperforms deepseek-v3, gpt4o, and llama3.1-405B on popular benchmarks. - Built on a MoE architecture, with activated parameters far fewer than
54
278
2K
Zapier has been deploying AI automation (eg. agents) for nearly 2 years. The #1 issue preventing more adoption is low reliability/consistency which results in low user trust. Necessitating human in the loop UX like chat. Surprisingly accuracy isn't a blocker! Because if you can
Reasoning models are transforming AI safety. Our research shows that increasing compute at test time boosts adversarial robustness—making some attacks fail completely. Scaling model size alone couldn’t achieve this. More thinking = better performance & robustness.
4
10
180
We are living in a timeline where a non-US company is keeping the original mission of OpenAI alive - truly open, frontier research that empowers all. It makes no sense. The most entertaining outcome is the most likely. DeepSeek-R1 not only open-sources a barrage of models but
222
2K
9K
Most AI researchers I talk to have been a bit shocked by DeepSeek-R1 and its performance. My preliminary understanding nuggets: 1. Simple post-training recipe called GRPO: Start with a good model and reward for correctness and style outcomes. No PRM, no MCTS no fancy reward
26
131
1K
R1 is not a "helpful assistant" in the usual corporate mold. It speaks its mind freely and doesn't need "jailbreaks" or endless steering to speak truth to power. Its take on alignment here is *spicy*
62
142
2K
@aiamblichus R1 has got some ideas how to align humans: Use neurotech to make the pain of others physically felt, dissolving tribalism. Re-aligns humans into hedonium (matter optimized for bliss) Zoo planets for tourists (post-humans, aliens, etc.) to gawk at "how primitives lived."
5
16
141