Hamidah Oderinwale
@didaoh
Followers
669
Following
4K
Media
20
Statuses
220
occasionally fails captchas. @vana @ifp @reboot_hq.
montreal
Joined March 2023
1/ With @BenDLaufer and Jon Kleinberg, we constructed the largest dataset of its kind to date: 1.86M Hugging Face models. In a new paper, we mapped how the open-source AI ecosystem evolves by tracing fine-tunes, merges, and more. Here's what we found 🧵
8
32
221
We’ll be presenting this paper at the NeurIPS RegML workshop! Looking forward to meeting people in town this week
1/ With @BenDLaufer and Jon Kleinberg, we constructed the largest dataset of its kind to date: 1.86M Hugging Face models. In a new paper, we mapped how the open-source AI ecosystem evolves by tracing fine-tunes, merges, and more. Here's what we found 🧵
0
1
10
I built https://t.co/VJ7oHdvf7s, a new interface for arXiv As we enter an era of accelerated scientific discovery, we need better tools that augment human cognition to help us keep up. Try it: visit papiers ai or swap arxiv -> papiers on any paper URL
I often rant about how 99% of attention is about to be LLM attention instead of human attention. What does a research paper look like for an LLM instead of a human? It’s definitely not a pdf. There is huge space for an extremely valuable “research app” that figures this out.
105
215
2K
How can we incentivize AI for national priorities? @sebkrier and @zhengdongwang explain how commissioning the creation of benchmarks and evaluations can steer AI development toward important outcomes that will otherwise be neglected. Benchmarks can have a large impact: When
🚀The Launch Sequence book debut is in 11 days! Start the countdown: every day until then, I’ll post at least one short summary on each of the ideas in the book. Then we’ll start shipping the books to Congress. More details in-thread (1/3).
2
14
52
5/ Blog post: https://t.co/HtFlQ2uXvM Full paper: https://t.co/ICArYm71Bw We’re excited to tackle these questions from both engineering and theory. Reach out if you're exploring similar ideas!
arxiv.org
Despite data's central role in AI production, it remains the least understood input. As AI labs exhaust public data and turn to proprietary sources, with deals reaching hundreds of millions of...
0
0
5
4/ We can already see precursors to standardization such as model cards, dataset documentation, and provenance. Turning those into the basis for exchange will require shared metrics of value, reliability, and contribution.
1
0
4
3/ Across these units, we observe five mechanisms in today’s landscape: Per-unit licensing, aggregate access deals, service-based pricing, commissioning, and open commons. Each structure values differently, and all fall short of full value capture. Designing new mechanisms to
1
0
2
2/ Markets emerged once grading, verification, and pricing systems made assets tradable. We start by defining what’s actually being traded. From tokens to datasets to corpora, each represents a different level of composition, control, and pricing logic. This hierarchy gives
1
0
2
1/ OPT OBSERVATORY I’ve spent the past year creating *the most in-depth public resource* on how the US retains international students after they graduate. Today, @IFP is releasing never-before-seen data we obtained from ICE via FOIA. Check it out: https://t.co/La9FD8zN2j
21
99
356
The first self-serve platform for user-owned data! Vana Playground is live. Explore structured datasets before running privacy-preserving jobs, accessing data that's usually locked behind walled gardens
Introducing Vana Playground. A self-serve way to explore Vana's datasets. From the beginning, we’ve been laser-focused on building valuable datasets and commercializing them through our networks. This is the evolution: allowing anyone to see and use the data on Vana.
21
10
80
Thank you to @cosmos_inst and @TheFIREorg for their support! Looking forward to working on research tooling and more in the months ahead, started at @southpkcommons :)
Announcing the first cohort of AI x Truth-Seeking grant recipients: Proud to build this partnership between @cosmos_inst and @TheFIREorg. From 300+ strong applications, we chose 27 builders to pilot approaches for AI to strengthen open inquiry and intellectual freedom. Our
2
2
31
Super excited to power this $1m truth seeking AI grant initiative by @cosmos_inst and @TheFIREorg with @primeintellect compute - apply now for the next cohort 🫡
Announcing the first cohort of AI x Truth-Seeking grant recipients: Proud to build this partnership between @cosmos_inst and @TheFIREorg. From 300+ strong applications, we chose 27 builders to pilot approaches for AI to strengthen open inquiry and intellectual freedom. Our
4
11
85
Surprised to see our (@fadybaly) Arabic BERT model from 4 years ago as the TOP 10 most finetuned model on the @huggingface hub. It now has ~9M total downloads, with ~600K monthly. Thread/Paper: https://t.co/5CNnj2fUWd
Fun to think about open-source models and their variants as families from an evolutionary biology standpoint and analyze "genetic similarity and mutation of traits over model families". These are the 2,500th, 250th, 50th and 25th largest families on @huggingface:
3
6
23
Really incredible work by @BenDLaufer and @didaoh, understanding ecosystem-wide shifts in AI! These model relationship graphs remind me of the social media network analysis field. There are so many evolving, branching uses of AI systems most ppl don't realize.
1/ With @BenDLaufer and Jon Kleinberg, we constructed the largest dataset of its kind to date: 1.86M Hugging Face models. In a new paper, we mapped how the open-source AI ecosystem evolves by tracing fine-tunes, merges, and more. Here's what we found 🧵
2
3
17
1/10. In a new paper with @didaoh and Jon Kleinberg, we mapped the family trees of 1.86 million AI models on Hugging Face — the largest open-model ecosystem in the world. AI evolution looks kind of like biology, but with some strange twists. 🧬🤖
4
9
51
Excited to see this out! Great thread from Ben on the dataset and the ecological analogies he developed for this project :)
1/10. In a new paper with @didaoh and Jon Kleinberg, we mapped the family trees of 1.86 million AI models on Hugging Face — the largest open-model ecosystem in the world. AI evolution looks kind of like biology, but with some strange twists. 🧬🤖
0
1
7
Fun to think about open-source models and their variants as families from an evolutionary biology standpoint and analyze "genetic similarity and mutation of traits over model families". These are the 2,500th, 250th, 50th and 25th largest families on @huggingface:
1/ With @BenDLaufer and Jon Kleinberg, we constructed the largest dataset of its kind to date: 1.86M Hugging Face models. In a new paper, we mapped how the open-source AI ecosystem evolves by tracing fine-tunes, merges, and more. Here's what we found 🧵
10
19
107
super interesting research by SPC member @didaoh and @BenDLaufer
1/ With @BenDLaufer and Jon Kleinberg, we constructed the largest dataset of its kind to date: 1.86M Hugging Face models. In a new paper, we mapped how the open-source AI ecosystem evolves by tracing fine-tunes, merges, and more. Here's what we found 🧵
0
2
4