mehdi cherti Profile
mehdi cherti

@mehdidc

Followers
344
Following
1K
Media
38
Statuses
732

PostDoc at Jülich Supercomputing Center (JSC), Germany / LAION.

Cologne, Germany
Joined January 2010
Don't wanna be here? Send us removal request.
@LoubnaBenAllal1
Loubna Ben Allal
11 days
After ~4 years building SOTA models & datasets, we're sharing everything we learned in ⚡The Smol Training Playbook We cover the full LLM cycle: designing ablations, choosing an architecture, curating data, post-training, and building solid infrastructure. We'll help you
35
158
1K
@wightmanr
Ross Wightman
5 months
timm's got a new vision transformer (NaFlexVit), and it's flexible! I've been plugging away at this for a bit, integrating ideas from FlexiViT, NaViT, and NaFlex and finally ready to merge for initial exploration. The model supports: * variable aspect/size images of NaFlex (see
5
38
232
@JJitsev
Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱
5 months
With great help by @gpuccetti92, @tommiekerssies and @rom1504 . More to come.
0
1
2
@JJitsev
Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱
5 months
When all of the sudden puzzle pieces fall into right places and predicting the unknown starts to work, those are rare beautiful moments I am grateful for in science. Made by rare minds of @marnezhurina @tomerporian @mehdidc https://t.co/DiuR6g8pil https://t.co/snYXA7AcRg
Tweet card summary image
arxiv.org
In studies of transferable learning, scaling laws are obtained for various important foundation models to predict their properties and performance at larger scales. We show here how scaling law...
1
10
25
@lschmidt3
Ludwig Schmidt
5 months
Very excited to finally release our paper for OpenThoughts! After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.
22
209
1K
@chelseabfinn
Chelsea Finn
7 months
Introducing π-0.5! The model works out of the box in completely new environments. Here the robot cleans new kitchens & bedrooms. 🤖 Detailed paper + videos in more than 10 unseen rooms: https://t.co/64ze7ToQup A short thread 🧵
15
101
703
@vishaal_urao
Vishaal Udandarao
7 months
🚀New Paper! https://t.co/cZZGbeVrgR Everyone’s celebrating rapid progress in math reasoning with RL/SFT. But how real is this progress? We re-evaluated recently released popular reasoning models—and found reported gains often vanish under rigorous testing!! 👀 🧵👇
4
55
265
@Thom_Wolf
Thomas Wolf
9 months
After 6+ months in the making and burning over a year of GPU compute time, we're super excited to finally release the "Ultra-Scale Playbook" Check it out here: https://t.co/mnC0UzZYsJ A free, open-source, book to learn everything about 5D parallelism, ZeRO, fast CUDA kernels,
110
706
4K
@Thom_Wolf
Thomas Wolf
9 months
Finally took time to go over Dario's essay on DeepSeek and export control and to be honest it was quite painful to read. And I say this as a great admirer of Anthropic and big user of Claude* The first half of the essay reads like a lengthy attempt to justify that closed-source
109
495
3K
@vishalmisra
Vishal Misra
10 months
My thoughts on DeepSeek sent to a reporter asking for comments
19
95
514
@retropolisart
Retropolis
10 months
Directed by David Lynch.
154
13K
58K
@Variety
Variety
10 months
Director-writer David Lynch, who radicalized American film with with a dark, surrealistic artistic vision in films like “Blue Velvet” and “Mulholland Drive” and network television with “Twin Peaks,” has died. He was 78. https://t.co/T2GOao28ux
864
16K
54K
@FrankRHutter
Frank Hutter
10 months
The data science revolution is getting closer. TabPFN v2 is published in Nature: https://t.co/Ybb15pnZ5P On tabular classification with up to 10k data points & 500 features, in 2.8s TabPFN on average outperforms all other methods, even when tuning them for up to 4 hours🧵1/19
36
251
1K
@SGRodriques
Sam Rodriques
1 year
Introducing PaperQA2, the first AI agent that conducts entire scientific literature reviews on its own. PaperQA2 is also the first agent to beat PhD and Postdoc-level biology researchers on multiple literature research tasks, as measured both by accuracy on objective benchmarks
79
767
3K
@vishaal_urao
Vishaal Udandarao
1 year
Ever feel frustrated when you vaguely know what paper you want to cite but can't find it on Google? Can LM-based agents automatically find paper citations for you? Our new paper presents a tough new benchmark for this task along with an LM-based agent for finding citations.
@ori_press
Ori Press
1 year
Can AI help you cite papers? We built the CiteME benchmark to answer that. Given the text: "We evaluate our model on [CITATION], a dataset consisting of black and white handwritten digits" The answer is: MNIST CiteME has 130 questions; our best agent gets just 35.3% acc (1/5)🧵
1
3
27
@tomerporian
Tomer Porian
1 year
🧵1/8 We resolve the discrepancy between the compute optimal scaling laws of Kaplan (exponent 0.88, Figure 14, left) et al. and Hoffmann et al. (“Chinchilla”, exponent 0.5). Paper: https://t.co/QKFbNl4J9t Data + Code: https://t.co/N3p0Xg0THH
6
33
171
@ShyamgopalKart1
Shyamgopal Karthik
1 year
Do you want to improve the performance of your text-to-image model without any training? That too by just looking for a better initialization noise? Sounds too good to be true? 🧵👇 https://t.co/b8jbASFiXE
@LucaEyring
Luca Eyring
1 year
Can we enhance the performance of T2I models without any fine-tuning? We show that with our ReNO, Reward-based Noise Optimization, one-step models consistently surpass the performance of all current open-source Text-to-Image models within the computational budget of 20-50 sec!
2
9
72
@_tim_brooks
Tim Brooks
2 years
"fly through tour of a museum with many paintings and sculptures and beautiful works of art in all styles" Video generated by #Sora
298
553
3K
@Rainmaker1973
Massimo
2 years
Beluga whales love to play, scare, joke and generally interact with humans. This compilation is a good example. [📹 aquariumadvicesa] https://t.co/P2ifmVPKqm
342
9K
102K
@agramfort
Alexandre Gramfort
2 years
For those of you who were wondering what I’ve been doing since I joined @Meta Reality Labs late 2022. Here is the first detailed scientific communication about our work. You can read the paper at:
Tweet card summary image
biorxiv.org
Since the advent of computing, humans have sought computer input technologies that are expressive, intuitive, and universal. While diverse modalities have been developed, including keyboards, mice,...
@SussilloDavid
David Sussillo
2 years
1/7 For the past decade, our team at Meta Reality Labs (previously CTRL-labs) has been dedicated to developing a neuromotor interface. Our goal is to address the Human Computer Interaction challenge of providing effortless, intuitive, and efficient input to computers.
5
24
168