BenDLaufer Profile Banner
Benjamin Laufer Profile
Benjamin Laufer

@BenDLaufer

Followers
585
Following
3K
Media
22
Statuses
621

PhD student @cornellCIS based in NYC @Cornell_Tech. Formerly ORFE @Princeton.

Joined April 2022
Don't wanna be here? Send us removal request.
@BenDLaufer
Benjamin Laufer
2 days
1/10. In a new paper with @didaoh and Jon Kleinberg, we mapped the family trees of 1.86 million AI models on Hugging Face — the largest open-model ecosystem in the world. AI evolution looks kind of like biology, but with some strange twists. 🧬🤖
4
9
50
@BenDLaufer
Benjamin Laufer
17 hours
Then the CEO of Hugging Face commented on his post, complimenting the model. The internet can be amazing….
0
0
0
@BenDLaufer
Benjamin Laufer
17 hours
I love this. This guy read our recent paper and found out that a model he trained 4 years ago is in the top 10 most fine-tuned (/remixed) open-source AI models on @huggingface. The model is “AraBERT,” a fine-tune of BERT for Arabic language. He spent $10 to train the model.
@wissam_antoun
Wissam Antoun
2 days
Surprised to see our (@fadybaly) Arabic BERT model from 4 years ago as the TOP 10 most finetuned model on the @huggingface hub. It now has ~9M total downloads, with ~600K monthly. Thread/Paper:
Tweet media one
1
0
4
@BenDLaufer
Benjamin Laufer
19 hours
RT @wissam_antoun: Surprised to see our (@fadybaly) Arabic BERT model from 4 years ago as the TOP 10 most finetuned model on the @huggingfa….
0
6
0
@BenDLaufer
Benjamin Laufer
2 days
RT @hbouammar: My god 🙌🏻🙌🏻 amazing!.
0
1
0
@BenDLaufer
Benjamin Laufer
2 days
RT @ShayneRedford: Really incredible work by @BenDLaufer and @didaoh, understanding ecosystem-wide shifts in AI!. These model relationship….
0
3
0
@BenDLaufer
Benjamin Laufer
2 days
RT @didaoh: Excited to see this out! Great thread from Ben on the dataset and the ecological analogies he developed for this project :).
0
1
0
@BenDLaufer
Benjamin Laufer
2 days
Eliahu’s work is very cool and inspired ours! I recommend taking a look. Also his visualization game is top notch 👌🏻.
@EliahuHorwitz
Eliahu Horwitz
2 days
Awesome followup to our work on the model atlas (. Seeing the community adapt this idea of model populations is exciting 🤩.
1
0
3
@BenDLaufer
Benjamin Laufer
2 days
Check out Hamidah's (@didaoh's) post which highlights some other aspects of the paper --
@didaoh
Hamidah Oderinwale
3 days
1/ With @BenDLaufer and Jon Kleinberg, we constructed the largest dataset of its kind to date: 1.86M Hugging Face models. In a new paper, we mapped how the open-source AI ecosystem evolves by tracing fine-tunes, merges, and more. Here's what we found 🧵
Tweet media one
1
0
2
@BenDLaufer
Benjamin Laufer
2 days
Thanks to Hamidah, Jon and everyone who helped make this project a reality! It was a lot of fun.
1
0
1
@BenDLaufer
Benjamin Laufer
2 days
11. 🗄️Dataset:
Tweet card summary image
huggingface.co
1
0
2
@BenDLaufer
Benjamin Laufer
2 days
10. This is just the start. Our dataset & methods open the door to a science of AI ecosystems. If you care about open-source AI, governance, or the weird ways technology evolves, give it a read. 📄Paper:
1
0
1
@BenDLaufer
Benjamin Laufer
2 days
9. Big picture: By treating ML models like organisms in an ecosystem, we can:.🌱 Understand the pressures shaping AI development.🔍 Spot patterns before they become industry norms.🛠 Inform governance & safety strategies grounded in real data
Tweet media one
1
0
2
@BenDLaufer
Benjamin Laufer
2 days
8. 🔹 Certain license types precede others (e.g., llama3 → apache-2.0).Here we show the top-20 licenses transitions over fine-tunes in the dataset.
Tweet media one
1
0
2
@BenDLaufer
Benjamin Laufer
2 days
7. We found optimal evolutionary orderings over traits:.🔹 Feature extraction tends to be upstream from text generation. Text generation is upstream from text classification.
1
0
3
@BenDLaufer
Benjamin Laufer
2 days
6. The license drift to permissiveness suggests open-source preferences outweigh regulatory pressures to comply with licenses. The English drift suggests a massive market for English products. The docs drift could be explained as a preference for efficiency — or laziness.
1
0
2
@BenDLaufer
Benjamin Laufer
2 days
5. Three major drifts:.1️⃣Licenses: from corporate to other types. We often see use restrictions mutate to permissive or copyleft (even when counter to upstream license terms).2️⃣Languages: from multilingual → English-only.3️⃣Docs: from long & detailed → short & templated.
1
0
3
@BenDLaufer
Benjamin Laufer
2 days
4. In biology, traits get passed from parent to child — mutations are slow & often modeled as random. In AI model families, mutations are fast and directed. Two sibling models tend to resemble each other more than they resemble their shared parent.
Tweet media one
1
0
3
@BenDLaufer
Benjamin Laufer
2 days
3. We measured “genetic similarity” between models from snippets of text - the metadata and model cards. Models in the same finetuning family do resemble each other… but the evolution is weird. For example, traits drift in the same directions again and again.
Tweet media one
1
0
1
@BenDLaufer
Benjamin Laufer
2 days
2. We reconstructed model family trees by tracing fine-tunes, adaptations, quantizations and merges. Some trees are small: one parent, a few children. Others sprawl into thousands of descendants across ten+ generations.
1
0
1