Daniel Vila Suero
@dvilasuero
Followers
4K
Following
8K
Media
386
Statuses
4K
Introducing @huggingface Sheets: Excel meets AI and unstructured data 📁 Run prompts and models over your data 🌐 Web search for accuracy and real-time information 🎯 Manual edits are used to improve generation 💯 Hundreds of open models and leading inference providers 1/2
2
16
82
Let me know if you try this out, looking for early feedback
🔥Direct dataset editing on the Hugging Face Hub🔥 No more downloading & reuploading a 500MB CSV just to fix 3 mislabeled rows. Edit cells, commit changes. Collaborative, fully versioned. Dataset curation finally works like code. https://t.co/HXfES1OCyy
1
0
2
Love this 🔥🙌🏽
🔥Direct dataset editing on the Hugging Face Hub🔥 No more downloading & reuploading a 500MB CSV just to fix 3 mislabeled rows. Edit cells, commit changes. Collaborative, fully versioned. Dataset curation finally works like code. https://t.co/HXfES1OCyy
1
2
12
🔥Direct dataset editing on the Hugging Face Hub🔥 No more downloading & reuploading a 500MB CSV just to fix 3 mislabeled rows. Edit cells, commit changes. Collaborative, fully versioned. Dataset curation finally works like code. https://t.co/HXfES1OCyy
6
8
22
We're launching the world's biggest AI virtual hackathon, with more than 6,000 registrations (and >$1 million in prizes and credits) Join us at the kickoff event in 30 minutes 🔥
Join us LIVE at MCP's first Birthday kickoff at 10 am PT today!🎂 Don't miss out on details about the celebration from the co-hosts, @Gradio and @AnthropicAI. 🔥 We've also got an exciting lineup of speakers from @Huggingface, @OpenAI, @GoogleDeepMind, @modal, @blaxelAI,
1
1
13
We’re launching SWE-fficiency to eval whether LMs can speed up real GitHub repos on real workloads! ⏱️ 498 optimization tasks across 9 data-science, ML, and HPC repos — each with a real workload to speed up. Existing agents struggle to match expert level optimizations!
12
23
199
🔥 IT'S OUT 🔥 Struggling to find benchmarks? Explore our repository for THOUSANDS of easy-to-run, well-documented tasks 🤩 📏 Creators, add yours for more visibility 🔥 Users, find and run models effortlessly with lighteval
4
11
39
Open models are reaching the frontier at an impressive pace. More choices mean more decisions: which model? Local or managed? Which provider? Cheaper or faster? As the offer grows, so does the need for better testing. That's why I built a new integration with Inspect AI for
2
9
21
5/You can also compare :fastest vs :cheapest providers, write custom scorers, and test models in agentic scenarios. All without setting up infrastructure. Drop a comment if you'd like me to run an eval you have in mind on the latest open models.
0
0
0
4/ Evaluate vision-language models with custom datasets I even built a "chihuahua or muffin" style dataset to test VLMs. Results: Qwen3-VL-8B: 0.7 accuracy Qwen3-VL-30B-Thinking: 0.9 accuracy Still unsolved 😂 ~30 lines of Python for custom evals.
1
0
1
3/ Compare the same model across providers I ran gpt-oss-120b across 10 providers on the same task. Scores ranged from 0.80 to 0.84. Hardware and inference implementations vary. Testing helps you find what actually works best for your use case.
1
0
0
2/ Benchmark multiple models on your task Pick models from the Hub, run them in parallel. No GPUs, no downloads—just: inspect eval-set https://t.co/e7703eAxCz --model \ "hf-inference-providers/MiniMaxAI/MiniMax-M2,\ hf-inference-providers/openai/gpt-oss-120b"
1
0
0
Open models are reaching the frontier at an impressive pace. More choices mean more decisions: which model? Local or managed? Which provider? Cheaper or faster? As the offer grows, so does the need for better testing. That's why I built a new integration with Inspect AI for
2
9
21
Unsurprisingly, Kimi K2 Thinking is already number one trending on HF. The AI frontier is open-source!
49
156
2K
Fastest and easiest way to download a repo from the HF Hub: $ uvx hf download mlx-community/granite-4.0-h-1b-8bit That's it. Same for uploads, no excuse not to upload your datasets and models. Under the hood: Xet, parallel downloads, huggingface_hub, and of course uv.
0
3
27
The AI frontier is open-source!
🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built
19
59
841
you can now push, run, and pull agentic environments from the hugging face hub as spaces. the workflow looks like this: - do `openenv init` to start a new environment from a template - build/port your rl environment - use `openenv push` to push the environment to hub. - from
4
14
96
Has anyone played with Petri (by @AnthropicAI) to test open model behaviours? Would love to chat
0
1
1
Or ... you could just host them on https://t.co/aeVYPxcibJ
Even when new AI models bring clear improvements in capabilities, deprecating the older generations comes with downsides. An update on how we’re thinking about these costs, and some of the early steps we’re taking to mitigate them:
4
17
204
If you're interested in agentic RL training, don't miss this awesome guide 👇
New guide on RL for agentic environments. This guide integrates OpenEnv, textarena, and TRL for training language models on reasoning games like wordle. Instead of relying only on static reward functions, you can now hook up your model to interactive environments (browsers,
0
0
1