David Berenstein Profile
David Berenstein

@davidberenstei

Followers
1K
Following
1K
Media
156
Statuses
906

ML & DevRel @ Hugging Face || πŸ‘¨πŸ½β€πŸ³ Cooking, πŸ‘¨πŸ½β€πŸ’» Coding, πŸ† Committing

Madrid
Joined June 2023
Don't wanna be here? Send us removal request.
@davidberenstei
David Berenstein
7 months
Imagine creating custom datasets and training AI models WITHOUT writing a single line of code. We did and made it a reality. @huggingface Synthetic Data Generator. Blog: Space: GitHub:
Tweet media one
7
90
440
@davidberenstei
David Berenstein
4 months
πŸ”₯ Bespoke curator: Synthetic Data Curation for Post-Training & Structured Data Extraction. Create synthetic data pipelines with easy!.- Retries and caching included.- inference via LiteLLM, vLLM, and popular batch APIs.- asynchronous operations. πŸ”— URL:
Tweet media one
0
1
12
@davidberenstei
David Berenstein
4 months
πŸ”₯One > token > at > a > time < a < at < token < OneπŸ”₯. token-explorer is a tool that lets you explore different possible paths that an LLM might sample!. - Arrow keys to navigate, pop and append tokens.- View the token probabilities and entropies. GitHub:
Tweet media one
0
0
2
@davidberenstei
David Berenstein
4 months
πŸ’₯ SMASH and run models 5x faster, 5x cheaper . Pruna AI is the AI Optimization Engine for ML teams seeking to simplify scalable inference. Make sure to ⭐️ their GitHub: .TechCrunch: .Smashed Models on HF:
1
5
11
@davidberenstei
David Berenstein
5 months
Agents aren't fully there yet, so we still rely on humans to create courses at Hugging Face! And with humans come slight imperfections. The @LangChainAI LangGraph framework unit will be released one week later on the 18th of March!. In the meantime, keep learning and creating!
Tweet media one
1
1
12
@davidberenstei
David Berenstein
5 months
Hook up AutoTrain, and you'll be able to go prompt to model!
Tweet card summary image
huggingface.co
0
0
1
@davidberenstei
David Berenstein
5 months
🍽️ Let’s dissect the Synthetic Dataset Generator. πŸ’¬ Natural language prompt to data . πŸ¦™ Ollama ensures secure local LLM inference. ✍🏼 Argilla’s data curation capabilities complete the workflow. πŸ”— GitHub:
1
3
15
@davidberenstei
David Berenstein
5 months
Private Synthetic Data Generation Made Easy: Out-of-the-Box with @Docker, @argilla_io & @ollama!. Synthetic data? All you need is Docker!. Thank you for the great contribution, @mcdaqc!. πŸ”— Blog: πŸ“Ί YouTube:
0
0
1
@davidberenstei
David Berenstein
5 months
πŸ”— Blog: πŸ’ͺ🏽 Great work @gretel_ai and team: Lipika R., Maarten Van Segbroeck, Ph.D., Dhruv Nathawani.
1
0
0
@davidberenstei
David Berenstein
5 months
πŸ”₯ Dataset drop for SAFER alignment!. πŸ™‹ Synthetic data containing safe/unsafe alignment pairs across various risk categories. πŸ”’ Helps mitigate malicious Use, System Risks, Information Hazards, Discrimination, and Societal Risks. πŸ”— Dataset:
Tweet media one
1
0
2
@davidberenstei
David Berenstein
5 months
RT @mcdaqc: Privacy in AI is crucial, but how can we generate realistic training data without exposing sensitive info?. We explored a solut….
Tweet card summary image
huggingface.co
0
1
0
@davidberenstei
David Berenstein
5 months
RT @dvilasuero: Weekend project:. Open-sourcing "Natural Science Reasoning", a smol reasoning dataset, the pipeline, and the code to build….
0
30
0
@davidberenstei
David Berenstein
5 months
RT @nathanhabib1011: Deepseek skyrocketed 17,694% in just 12 weeks lmao. That’s what happens when you drop open models that rival the best….
0
8
0
@davidberenstei
David Berenstein
5 months
RT @yifeiwang77: No More Tears with MatryoshkaπŸͺ†Β & Let's Embrace Sparsity πŸš€πŸš€. With sparse autoencoders + sparse contrastive learning, we can….
0
100
0
@davidberenstei
David Berenstein
5 months
@calebfahlgren great work by you and the team :).
0
0
1
@davidberenstei
David Berenstein
5 months
πŸ”₯ Text2SQL, explore and share any data analysis!. πŸ€— Hugging Face - Dataset Studio is an amazing new feature. πŸš€ Start yourself:
1
4
15
@davidberenstei
David Berenstein
5 months
The team has been working flat-out on this for a few weeks. Supported by @LoganMarkewich @seldo from @llama_index!. Who won? You decide!.
0
0
1
@davidberenstei
David Berenstein
5 months
It is set to offer a great contrast with the smolagents unit by looking at. - What makes llama-index stand-out.- How the LlamaHub is used for integrations.- Creating QueryEngine components.- Using agents and tools.- Agentic and multi-agent workflows.
Tweet card summary image
huggingface.co
1
0
2
@davidberenstei
David Berenstein
5 months
πŸ₯Š Epic Agent Framework Showdown! Available today!. πŸ”΅ In the blue corner, as challenger, the experienced knowledge retriever: @llama_index. πŸ›‘ In the red corner, the defender, weighing in with lightweight efficiency: @huggingface smolagents!. πŸ”— URL:
Tweet media one
1
10
54
@davidberenstei
David Berenstein
5 months
πŸ”₯ Vicinity: SEVEN semantic search BACK-ENDS, ONE single INTERFACE!. 🫸 New release to push vector search to the Hub and work with any serialisable objects. πŸ§‘β€πŸ« KNN, HNSW, USEARCH, ANNOY, PYNNDESCENT, FAISS, and VOYAGER. πŸ”— Library:
0
1
3
@davidberenstei
David Berenstein
5 months
@Gradio is making AI accessible! ❀️.
0
0
2