
πΊπ¦ Dzmitry Bahdanau
@DBahdanau
Followers
9K
Following
422
Media
22
Statuses
484
Team member at something young. Adjunct Prof @ McGill. Member of Mila, Quebec AI Institute. Stream of consciousness is my own.
Joined August 2017
Great comment. LLMs feel like a 1000 year old robot who worked all jobs, talked to everyone, learned everything, and yet can't cross the uncanny valley of actually *getting* what I want, having agency, being creative. But OTOH that's enough to shake the economy.
7. However, LLMs will become exceedingly powerful for problems that *someone* knows how to solve (in-distribution, in training data). In math research, you combine existing techniques with new creative ideas. LLMs will significantly accelerate the former part. (7/10).
0
0
17
RT @GabrielHuang9: As #ICML2025 kicks off in Vancouver, our AI talent is being quietly pushed out. π¨π¦. We've been waiting 28 months for perβ¦.
0
10
0
So long, @ServiceNowRSRCH ! It's been great 4 years. I look forward to cheer for more great open-source AI releases from the talented ServiceNow AI people!. I will tell you what's next in due time π
8
1
151
so nice to have a few actual Scientists in our community that respect empirical results even when they are go counter their intuition!.
This study surprised me! The conclusion is opposite to what I would expect. It is tempting to try to find a reason it's bogus but I think it's well executed and solid work. As the authors say, there are a number of potential caveats for this setting that may not generalize.
0
0
6
exactly what I feel about using AI to make advanced modifications in RL code. a lot of busy and chaotic activity but slower than thinking carefully on one's own.
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
1
0
10
A lot of great ideas on how to remedy training instabilities in @MiniMax__AI tech report. Check it out!.
Day 1/5 of #MiniMaxWeek: Weβre open-sourcing MiniMax-M1, our latest LLM β setting new standards in long-context reasoning. - Worldβs longest context window: 1M-token input, 80k-token output.- State-of-the-art agentic use among open-source models.- RL at unmatched efficiency:
0
0
21
200%. grit grit grit and 50 w&b curves and everything will work eventually.
Someone passed this wisdom to me today. Deep learning techniques working vs not working is two devils .- your prior about the technique.- your attention to details about implementation of the technique . Need both to make it work.
3
0
16
nicely done, team!!.
π¨π€― Today Jensen Huang announced SLAM Lab's newest model on the @HelloKnowledge stage: AprielβNemotronβ15BβThinker π¨.A lean, mean reasoning machine punching way above its weight class π.Built by SLAM Γ NVIDIA. Smaller models, bigger impact. π§΅π
0
0
10