Argilla Profile Banner
Argilla Profile
Argilla

@argilla_io

Followers
3,415
Following
28
Media
287
Statuses
1,293

Making LLM data go brrrr

World
Joined August 2021
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@argilla_io
Argilla
1 month
💥After months of work, we're thrilled to introduce ⚗️distilabel 1.0.0! 🚀More flexible, robust, and powerful. 🙌 Let's empower the community to build the most impactful datasets for Open Source AI! Blogpost: Github:
4
19
100
@argilla_io
Argilla
5 months
🚀 Open-source AI strikes again! Announcing Notux 8x7B, a fine-tune of Mixtral Instruct with high-quality chat data and DPO. Notux now the top ranked MoE on the Open LLM leaderboard.
Tweet media one
8
84
436
@argilla_io
Argilla
3 months
🚀🧙🏼‍♂️Introducing OpenHermesPreferences: the largest open dataset for RLHF & DPO Built together with the @huggingface H4 team, it's a 1M preferences dataset on top of the amazing @Teknium1 's dataset. Let's dive in! 🧵
Tweet media one
6
58
288
@argilla_io
Argilla
3 years
Build a news classifier from scratch with weak supervision 1. Programatically label 38.000 examples with rules and Snorkel. 2. Train a downstream classifier with scikit-learn to achieve 0.81 macro avg. f1-score. Tutorial link below 👇 #nlproc #ml #datascience #opensource
Tweet media one
2
54
277
@argilla_io
Argilla
6 months
🔥Open-source, open-science, and data curation for the win! Meet Notus 7B, a new LLM tuned with DPO on a new curated UltraFeedback dataset, surpassing Zephyr and Claude 2 on AlpacaEval. Built on the shoulders of giants: 🙌 @huggingface Alignment Handbook
1
64
252
@argilla_io
Argilla
5 months
🔥 More is less for DPO, high quality matters! 📢 Dropping our first open dataset and LLM of the year: 💾Meet distilabel Orca Pairs DPO, an improved version of the now famous dataset from @intel 🏛️And a new OpenHermes model outperforming baselines with 54% less DPO pairs 🧵
Tweet media one
5
46
231
@argilla_io
Argilla
4 months
🔥 Introducing a new open dataset for the Open Source AI community: OpenHermes2.5-dpo-binarized-alpha built atop the amazing dataset by @Teknium1 This time we use OSS models for everything, even for the preference step! 🧵
Tweet media one
39
51
181
@argilla_io
Argilla
2 years
Training a text classifier without labelled data using @PyTorch End-to-end weak supervision with Weasel & @huggingface transformers Guide and more details below 👇 #nlproc #datascience #python #opensource
Tweet media one
1
44
201
@argilla_io
Argilla
1 year
🚀Data labeling from the @huggingface Hub is here. No more excuses to build great NLP datasets!
0
33
201
@argilla_io
Argilla
3 months
🤖Yesterday, we shared a large AI feedback dataset 👩‍💻Today, @argilla_io & @huggingface are thrilled to release a high-quality human feedback dataset, built with and for the community! 10K Prompts Ranked: +14K human ratings from +300 contributors!
Tweet media one
2
20
142
@argilla_io
Argilla
4 months
🚀 The OSS AI community needs more open datasets for improving LLMs: 🎁 Excited to ship a new open DPO dataset for boosting chat models: ⚗️ distilabel capybara-dpo, a multi-turn preference dataset built atop the awesome dataset by @ldjconfirmed 🧵
Tweet media one
4
24
120
@argilla_io
Argilla
3 months
We're building a high-quality prompt dataset together with the community. The result will be published with an open, commercial-friendly license, anyone can use to build eval, sft and DPO datasets. We need your help! This is how simple is to contribute:
4
26
113
@argilla_io
Argilla
5 months
🔦 In this paper released by Apple, they introduced an efficient LLM inference model for devices with limited memory, showing inference speeds of 4-5x in CPU, and 20-25x in GPU. Argilla's GitHub:  distilabel:  #nlproc   #llms
Tweet media one
5
17
104
@argilla_io
Argilla
2 years
A simple active learning loop with the amazing ModAL library Active learning for YT spam classification inside a Jupyter notebook, a tutorial by @vid_algo Rubrix: ModAL: #nlproc #python #opensource
Tweet media one
0
25
108
@argilla_io
Argilla
1 year
🚀 Want to train LLMs in your own language? For commercial use? Introducing @databricks Dolly Multilingual Datasets Currently includes translations into Spanish, French, and German. Want to see more languages? Join us!
1
15
105
@argilla_io
Argilla
4 months
🤖 Turing-Complete Transformers: Two Transformers are more Powerful than One. In this paper , currently under review as a conference paper at ICLR2023, the researchers present Find+Replace, a family of transformer architectures that are Turing-complete.
Tweet media one
7
18
104
@argilla_io
Argilla
3 months
🌊Introducing DPO Mix 7K, a small DPO dataset that does wonders! Yesterday, @_philschmid & @_lewtun showcased its strength with Zephyr Gemma If you're looking for a small, diverse, high quality DPO dataset check it out!
Tweet media one
2
25
103
@argilla_io
Argilla
3 months
🔥 Data is better together🔥 At @argilla_io & @huggingface , we believe in the collective intelligence of the OSS AI community So we have partnered to let everyone contribute to AI datasets! You can start contributing now to the first initiative: 🧵
Tweet media one
2
18
102
@argilla_io
Argilla
3 months
Yesterday, we shared 10k_prompts_ranked: +14K human ratings by +300 amazing contributors To understand the data, we collaborated with our friends @graphext , the data analysis tool for visual thinkers Now it's available to everyone👇 Let's deep dive! 🧵
4
20
102
@argilla_io
Argilla
6 months
⚗ How distilabel works? 🚀 Yesterday, we announced our new open-source project, fully integrated with Argilla. First of all, if you don't yet know the project, go to 👉 Today we want to share more details about how it works 🧵
Tweet media one
15
30
99
@argilla_io
Argilla
2 years
Fine-tune a Hugging Face transformers for your own domain Iteratively build a training set and fine-tune a sentiment classifier for the banking domain Tutorial by @dvilasuero #NLProc #python #opensource
Tweet media one
1
30
94
@argilla_io
Argilla
1 year
🔥Thrilled to share our new tutorial: Collecting human preference data and training a reward model with the awesome trl by @huggingface The very first end-to-end example of the new Reward Trainer of trl Colab with @younesbelkada & @lvwerra
Tweet media one
1
25
93
@argilla_io
Argilla
2 years
Training a Hugging Face text classifier directly from search queries? User Weasel & Rubrix 👉 Weasel: End-to-End weak supervision by @CachaySalva & @BenBoecking : Rubrix: #nlproc #python #opensource
Tweet media one
0
34
89
@argilla_io
Argilla
1 year
Zero-shot sentiment classification with human-readable explantions with @OpenAI GPT-3. Is it any good? 🤖 You can now run this yourself using Colab and 🤗 Spaces: 👀 Or see by yourself on this Argilla Space (argilla/1234):
Tweet media one
1
16
92
@argilla_io
Argilla
4 months
🔥 Open source, open datasets & open collaboration go a long way 🍿The story behind NeuralBeagle14, a top performing 7B model released by @maximelabonne
Tweet media one
4
14
89
@argilla_io
Argilla
2 years
FlairNLP comes with a zero-shot NER model for English How good is it with a challenging dataset like WNUT17? Even more interesting: How good is this for your data? Can it help you with auto-labelling? A tutorial 👇🏽 #nlproc #python
Tweet media one
1
18
77
@argilla_io
Argilla
1 year
Data quality, fueled by human feedback, is the next frontier of LLMs We are stoked to introduce the first open-source, enterprise-grade solution for the scalable collection of human feedback to power the next wave of custom LLMs 1/7
5
21
78
@argilla_io
Argilla
3 months
🔥 We'll be doubling down on open DPO datasets, why? A few months ago: UltraFeedback 62K examples --> got Zephyr beta sft from 7.0 to 7.34 on MTBench Last week: @argilla_io mix 7K examples --> got Zephyr Gemma sft from 7.17 to 7.82 on MTBench 2x improvement with 9x less data
Tweet media one
1
21
74
@argilla_io
Argilla
2 months
🌏🌎🌍 Interested in evaluating LLMs for your language? We've selected 500 high-quality prompts. We'll be supporting the community to validate the translations and create an open multilingual benchmark Get involved & nominate yourself as language lead
1
21
73
@argilla_io
Argilla
27 days
🌎 Better AI is better data, and for better data we need expertise! As part of the 'Data is Better Together' project in collaboration with Hugging Face, we bring the Domain Specific Datasets. You can read more in this post: 🤗
1
26
73
@argilla_io
Argilla
3 months
🦒 Improving Text Embeddings with LLMs, a new distilabel tutorial: In this distilabel tutorial, we will be replicating the process described in "Improving Text Embeddings with Large Language Models" by Liang Wang et al. ().
Tweet media one
1
12
72
@argilla_io
Argilla
5 months
💬 Did you know you can finetune an LLM, Mistral 7B in particular, on a chat-style instruction dataset? This Argilla tutorial has a step-by-step guide! Argilla:  distilabel:  #nlproc   #llms   #python   #opensource
Tweet media one
2
5
70
@argilla_io
Argilla
3 months
🤖👩‍💻Do humans prefer synthetic or human generated prompts? If you wanna know, participate in the prompt collective event and share with your friends & teammates! With +230 participants and ~4,300 ratings already, here's some interesting results 1/7🧵
Tweet media one
4
18
70
@argilla_io
Argilla
6 months
What a week! > We launched Notus, one of the most powerful commercial LLMs out there. > @weights_biases , the iconic AI company, shows how they use Argilla for LLM eval > A DPO tune of OpenHermes by our admired @Teknium1 shows great gains using our curated preferences dataset.
Tweet media one
3
10
71
@argilla_io
Argilla
4 months
Data is one of the best ways to contribute to open source AI Two weeks ago we shared the distilabel orca pairs dataset. 13k downloads, 10s of models fine tuned with it, still trending on @huggingface But that's not the most important 🧵
Tweet media one
1
15
67
@argilla_io
Argilla
1 year
🔥🦜🔗 Stoked to announce our integration with @LangChainAI Monitor and collect human feedback from LangChain apps with a few lines of code 🚀Great job by @alvarobartt Thanks @hwchase17 and team!
@LangChainAI
LangChain
1 year
✍️LLM feedback with @argilla_io @argilla_io is a great tool for LLM data labeling, curation, and monitoring. Use the new Argilla callbacks to easily log and provide feedback for you LangChain LLMs! Integration docs: Argilla docs:
Tweet media one
Tweet media two
Tweet media three
2
4
40
1
14
69
@argilla_io
Argilla
4 months
🌸 Synthetic Haiku DPO 🌸 🙌A DPO dataset by @vanstriendaniel generated with OSS models ⚗️ Built with distilabel using the awesome OpenHermes by @Teknium1 Let's dive in! 🧵
Tweet media one
1
11
65
@argilla_io
Argilla
2 years
Finding and 𝗰𝗼𝗿𝗿𝗲𝗰𝘁𝗶𝗻𝗴 label errors 1⃣ Train a text classifier, predict over the test set 2⃣ Find label errors with the built-in cleanlab integration 3⃣ Correct errors with the UI Practical tutorial by @vid_algo #nlproc #opensource #ml
Tweet media one
0
24
66
@argilla_io
Argilla
2 years
Building a text classifier with Flyingsquid it's never been easier Flyingsquid is a label model for fast and accurate weak supervision Weak supervision guide: Flyingsquid: Rubrix: #nlproc #opensource #ml
Tweet media one
1
16
62
@argilla_io
Argilla
30 days
🙋❓ Curious about the differences between DPO, KTO, ORPO, and other preference alignment algorithms? Check out a comprehensive overview in the latest blog post of our series with MantisNLP.
2
13
61
@argilla_io
Argilla
4 months
🧮 Meta's LIMA, a model trained by fine-tuning LLaMa with only 1000 curated prompts and responses, demonstrates strong performance when compared to similar models with a much bigger set of prompts used for alignment. #nlproc   #llms   #python   #opensource
Tweet media one
3
12
57
@argilla_io
Argilla
2 years
NLP datasets are dynamic and heterogeneous. Rubrix Metrics let you compute live metrics & dataset attributes (e.g., entity density, entity consistency) Metrics can be analyse for different dataset slices or subpopulations Guide: #nlproc #python #ml
Tweet media one
1
12
53
@argilla_io
Argilla
5 months
In 2024 we're going to build some awesome open datasets for the AI community. Our focus will be on instructions and preference datasets for DPO/RLHF ❓What should we build?
8
9
52
@argilla_io
Argilla
2 months
🔥 Community and Data Quality Are More For Alignment A recipe to replicate SPIN with 30x less data: 🗣️ 50K samples vs 1.8K prompts curated by the 350+ amazing DIBT contributors. ⚗️ Distillation of @MistralAI Large instead of OpenAI 🙌 Open data & code with ⚗️distilabel 👇
Tweet media one
1
19
53
@argilla_io
Argilla
4 months
🦦We present CapybaraHermes-2.5-Mistral-7B, a model trained with the capybara-dpo dataset, built with ⚗️ distilabel. It's a preference-tuned OpenHermes-2.5-Mistral-7B. #nlproc   #llms   #python   #opensource
Tweet media one
2
5
52
@argilla_io
Argilla
6 months
⚗️ Preference data is the key ingredient for DPO/RLHF pipelines distilabel implements generation and labeling focusing on scalability and SoTA methods like UltraFeedback or JudgeML Get started and help us shape the project: #rlhf #rlai #opensource
Tweet media one
3
14
52
@argilla_io
Argilla
4 months
🎉⚗️ distilabel 0.5.0 is out! Packed with new features and enhancements to build and improve LLM fine-tuning datasets with the latest techniques. Let's see what's inside this release! 🔥 Deita components for evolution and automatic data selection of instruction tuning data 🧵
Tweet media one
2
13
52
@argilla_io
Argilla
2 years
What is a text2text model? In simple words: a model which given a text returns another text Text summarization is one of such models This is how to use the scitldr, summarization dataset by @allen_ai 👇 More task examples: #nlproc #python #opensource
Tweet media one
0
18
50
@argilla_io
Argilla
2 years
🚀 Introducing Rubrix Weak Labeling 🚀 ⚡️ Supercharge your NLP data annotation with interactive weak supervision 👩‍🔬 Leverage the latest weak supervision methods 👩‍🏭 Production-ready and open-source #nlproc #datacentricAI #opensource #python
0
20
49
@argilla_io
Argilla
5 months
📰 Last minute update: With 54% less data & decontaminated gsm8k training prompts, distilabeled Hermes highly improves GSM8K results. Second time we see this with DPO: last time was @winglian on TruthfulQA using @argilla_io UltraFeedback cleaned.
Tweet media one
@argilla_io
Argilla
5 months
🔥 More is less for DPO, high quality matters! 📢 Dropping our first open dataset and LLM of the year: 💾Meet distilabel Orca Pairs DPO, an improved version of the now famous dataset from @intel 🏛️And a new OpenHermes model outperforming baselines with 54% less DPO pairs 🧵
Tweet media one
5
46
231
1
6
48
@argilla_io
Argilla
2 years
Do you use @FastAPI for serving NLP models? Monitor data & predictions with Rubrix Unlock data, prediction & model monitoring for text classification and NER Rubrix, open-source framework for data-centric NLP 👉
Tweet media one
0
6
49
@argilla_io
Argilla
3 years
Rubrix: Python framework for data-centric NLProc 🎉 New release 0.5.0 Support for text2text tasks (text summarization, OCR post-processing & many more) Kudos @abdrahman_issam & other new contributors #python #opensource #NLProc
Tweet media one
1
18
47
@argilla_io
Argilla
4 months
🔥 Top 3 trending datasets for DPO built with ⚗️ distilabel, curated with @argilla_io Let's publish some more next week! What should we build next?
Tweet media one
1
7
46
@argilla_io
Argilla
1 year
🔥 Zero-shot then few-shot with SetFit 🔥 SetFit + Argilla + Vector search is a game changer in terms of speed and the ability to go from no labels to a decent dataset and model in just some iterations.
Tweet media one
0
9
45
@argilla_io
Argilla
1 month
Welcome Zephyr 141B to Hugging Chat🔥 You gotta love its system prompt 🤩🤩🤩🤩
Tweet media one
Tweet media two
@osanseviero
Omar Sanseviero
1 month
Welcome Zephyr 141B to Hugging Chat🔥 🎉A Mixtral-8x22B fine-tune ⚡️Super fast generation with TGI 🤗Fully open source (from the data to the UI)
Tweet media one
12
85
386
1
7
45
@argilla_io
Argilla
2 months
Can the community build impactful datasets in just 10 days? Yesterday we shared the code matching SPIN (Self-Play Fine Tuning) with 30x less data! Big win of quality vs quantity This with 10K rated prompts, rate some prompts to see where 20K gets us 👇
Tweet media one
2
7
44
@argilla_io
Argilla
5 months
Yesterday, we released Notux 8x7b v1, a Mixtral 8x7B Instruct v0.1 fine-tune using a second iteration of DPO and our cleaned and binarized version of the UltraFeedback dataset. Today, we release a new space to chat with it 🚀
1
9
43
@argilla_io
Argilla
1 month
🥁 Launching a new dataset: Capybara-Preferences, built with distilabel 1.0 ⚗️! Hard at work fine-tuning Llama 3? Here's the dataset you've been waiting for. Initial results with ORPO & this dataset are 🔥 🧵What makes this dataset so special?
Tweet media one
2
13
43
@argilla_io
Argilla
4 months
🤔 HELM: A holistic framework for evaluating foundation models. Partnering with some of the most influential AI companies, several researchers from Stanford University have presented the (HELM) framework, to improve the transparency of language models.
Tweet media one
1
12
39
@argilla_io
Argilla
1 year
🚀Training high-quality models from Argilla just got a lot easier Use Argilla + @huggingface AutoTrain to train NLP models without a single line of code. Don't wait for your LLaMA weights and start building powerful #opensource NLP models!
1
9
42
@argilla_io
Argilla
3 months
🌀 We're taking 10k_prompts_ranked for a SPIN & the results are 🤯 What's 10k_prompts_ranked? What's SPIN? Stay tuned for another small open data win! In the meantime, discover the dataset and contribute!
1
8
42
@argilla_io
Argilla
3 months
🔂 We ran Self-Play fIne-tuNing (SPIN) on the DIBT prompt collective data so we figured writing a blog with @mantisnlp would be an awesome way to get the community ready for yet another awesome model release that highlights the need for high-quality data.
0
10
41
@argilla_io
Argilla
5 months
The answer is yes and the result is: distilabeled Hermes 2.5, a model fine-tuned on top of the amazing OpenHermes by @Teknium1 . Unlike other DPO fine-tunes is trained with only 6K examples. It outperforms @maximelabonne 's NeuralHermes with the same recipe but 54% less samples
Tweet media one
1
3
40
@argilla_io
Argilla
2 months
🧙 Create an evol-instruct dataset with distilabel In this tutorial, we develop an evol-instruct dataset by employing the approaches outlined in  and  using distilabel. #nlproc   #llms   #python   #opensource
Tweet media one
1
5
40
@argilla_io
Argilla
1 year
Always wanted to gather user feedback from your @Gradio apps for data labeling? Check this example Space 1. Users can flag specific responses from the sentiment classifier 2. Flagged predictions are logged into Argilla for labeling & validation 🧵 👇
1
12
38
@argilla_io
Argilla
2 years
Do you use @FastAPI for serving NLP models? Monitor data & predictions with Rubrix 👉 📓 Log input data and predictions to detect live issues. 📈 Collect labelled data to evaluate models in productions. #mlops #nlp #python #opensource
Tweet media one
1
12
36
@argilla_io
Argilla
4 months
Phi-2 is now open source! Microsoft has recently changed the license of their phi-2 model from research to MIT, it can be used for commercial purposes now . #nlproc   #llms   #python   #opensource
Tweet media one
2
1
38
@argilla_io
Argilla
1 year
fast few-shot learning 🫱🏾‍🫲🏼 active learning Active learning with classy-classification and @argilla_io You can now run this tutorial on Google Colab and Hugging Face Argilla Spaces #nlp #opensource #python #argilla #nlproc
Tweet media one
0
11
37
@argilla_io
Argilla
1 year
🔥 Zero-shot then few-shot with SetFit 🔥 SetFit + Argilla + Vector search is a game changer in terms of speed and the ability to go from no labels to a decent dataset and model in just some iterations.
Tweet media one
1
9
37
@argilla_io
Argilla
20 days
🆕 Open replication of @cohere 's "Replacing Judges with Juries" using distilabel! 🧑‍💻 Post by @alvarobartt : 📄 Paper:
Tweet media one
1
10
36
@argilla_io
Argilla
2 months
We will say it once again: Contributing open datasets is one of the most impactful ways to accelerate OSS AI Thrilled to see the community building at this pace
@zaynismm
Zain ul abideen
2 months
🌟 ORPO, a technique that replaces SFT+DPO/PPO was released recently. I saw @_philschmid 's post regarding it yesterday. Gave ORPO a shot with phi-2 and @argilla_io dpo-mix-7k. Model: Try out LazyORPO (Automated):
0
6
64
2
8
37
@argilla_io
Argilla
2 months
🌟 Exciting News! 📷 We've been featured in @CBinsights prestigious top 100 AI company list for 2024! 📷📷 It's an honor to be recognized among the companies driving innovation and shaping the future of AI .✨
0
10
34
@argilla_io
Argilla
3 months
Yesterday, we launched a new feature to enable sign in with @huggingface into @argilla_io spaces. We also launched a collective initiative to collect high-quality prompts. In just a few hours: ~80 contributors and more 1500 data points! Join us!
Tweet media one
1
11
34
@argilla_io
Argilla
2 years
Starting a project with little labelled data? Tutorials for few and zero shot classification 👇 #python #argilla #nlproc #opensource
Tweet media one
1
10
33
@argilla_io
Argilla
1 month
Did you know that Argilla and distilabel datasets have over 6 million hub downloads on the Hub? 🤯 Now, distilabel datasets will be even easier to identify thanks to the new icon added to the @huggingface Hub—a nice addition to yesterday's release!
Tweet media one
1
10
34
@argilla_io
Argilla
2 months
📚 📚 Part 6: Identity Preference Optimization (IPO) of our collab with @mantisnlp . Can IPO be a valid alternative to DPO? Don't worry we will soon cover an overview of KTO and ORPO as well 🤓
1
6
32
@argilla_io
Argilla
2 months
🚀So excited to see the progress of open-weights models these past days! At @argilla_io we build open datasets so people can build better models. Our datasets now power hundreds of open models. Here's a recap of the datasets we've shared in just a few months
Tweet media one
1
8
33
@argilla_io
Argilla
5 months
We used distilabel, our open-source AI Feedback framework to build a preference dataset with ratings for each pair and natural language critiques. It took around 3 hours to build the full dataset and just a few lines of code. 🧵
Tweet media one
2
3
33
@argilla_io
Argilla
6 months
💨 Argilla - Notus 7B v1 is already quantized thanks to the awesome TheBloke! You can find them in the 🤗 Hub at: - GGUF - AWQ - GPTQ #nlproc #llms #data #quality #opensource
1
9
32
@argilla_io
Argilla
1 year
From few-shot to production datasets: Label, train, and predict using SetFit! Iterate on your NLP tasks with the new Argilla training module Supports SetFit, Transformers, and spaCy.
Tweet media one
2
10
32
@argilla_io
Argilla
4 months
🚀 Building open datasets is the best way to contribute to Open AI Let's keep pushing! The best performing 7B model on the Open LLM leaderboard: > a merge of a model DPO'd with @argilla_io distilabel orca pairs > with an additional DPO fine-tuning using the same dataset
@maximelabonne
Maxime Labonne
4 months
🐶 NeuralBeagle14-7B It's the best-performing 7B parameter model on the Open LLM Leaderboard. Remarkably, it also ranks as the 10th best-performing model overall on the Open LLM Leaderboard. In just 7B parameters! Merge + DPO = profit
Tweet media one
16
47
360
4
5
30
@argilla_io
Argilla
2 years
🎉 New release 0.13.0 🎉 🗂 Multi-label weak supervision 🤗 Read ANY dataset from the Hugging Face Hub 👥 Redesigned team workspace for team collaboration #python #opensource #nlproc
Tweet media one
1
11
29
@argilla_io
Argilla
3 months
⚗️ distilabel v.0.6.0 released 🎉 > JSONOpenAILLM to get JSON responses from generation models > HF InferenceEndpoints to run Free/Pro API endpoints If you have a HF Pro account, you can now generate and label with Mixtral! Release notes:
Tweet media one
1
7
29
@argilla_io
Argilla
1 year
Fast few-shot classification with active learning to speed up data labelling ❤️ Classy-classification few-shot learning within an active learning loop Tutorial: #machinelearning #opensource #python #argilla #nlproc
Tweet media one
1
9
28
@argilla_io
Argilla
3 months
🚀We're thrilled by the progress we've made so far We can reach the goal of annotating 10,000 prompts by the end of this week! Once reached, we'll publish v1 of the dataset Do you have an account on  @huggingface ? It takes 5 seconds to contribute 👇
Tweet media one
2
10
28
@argilla_io
Argilla
3 months
Do you want to improve OSS LLMs for your domain or language? We can support you running a community effort! Join our first cohort. It's fun and you get to work with amazing people from @huggingface like @vanstriendaniel and from @argilla_io like @alvarobartt 👇👇
@vanstriendaniel
Daniel van Strien
3 months
The Data is Better Together Discord channel is already engaging in discussions about some incredibly important and interesting use cases. If you're interested in contributing to the creation of better datasets, we welcome you to join us! Learn more here:
Tweet media one
0
4
15
0
8
29
@argilla_io
Argilla
4 months
🔥 distilabel orca pairs is trending on @huggingface > 1k downloads in less than a week 17 models using it! If you're doing dpo try it out!
Tweet media one
1
4
29
@argilla_io
Argilla
3 years
👋 Twitter I'm Rubrix, a free and open-source Python framework to explore, label, and monitor data for natural language processing. Follow this account for updates and learning materials about practical NLP #python #opensource #nlproc
0
15
28
@argilla_io
Argilla
11 months
We set up a distributed annotation for RLHF with our open LLM CrowdEval and the newly announced persistent storage for Hugging Face Spaces. Claim your login: Contribute here: About HF persistent storage:
Tweet media one
3
10
29
@argilla_io
Argilla
1 year
🤯 Labeling data without leaving Google Colab? 1⃣ Pick your favorite Argilla tutorial 2⃣ Use the Open in Colab button 3⃣ Deploy Argilla using @huggingface Spaces 4⃣ Use %%html to embed Argilla using the iframe shown behind the Embed this space button. 🚀 Start labeling
0
7
28
@argilla_io
Argilla
2 years
🗂️ Multi-label text classification weak labeling Get started with this brand new feature with this tutorial by @vid_algo #python #opensource #datacentricai
0
17
27
@argilla_io
Argilla
5 months
👩🏼‍🏫 Have you heard about Self-Instruct? This framework allows bootstrapping the ability of LLMs to generate their own instructions. You can try it out in distilabel! Argilla's GitHub:  distilabel:  #nlproc   #llms   #python   #opensource
Tweet media one
0
11
26
@argilla_io
Argilla
3 months
Do you have a @huggingface account? It takes 5 seconds to contribute to OSS AI 👉 We are running a collaborative effort to annotate prompts and improve data for future OSS LLMs. 📱You can even do it from your phone! 🗣️ Every voice counts!
Tweet media one
1
18
26
@argilla_io
Argilla
1 year
🎉 Big news! We raised $1.6 Million to Transform Data Labeling for NLP, co-led by @ZettaVentures and @CaixaCR
Tweet media one
0
4
26
@argilla_io
Argilla
1 year
🐑 Learn how to curate instruction datasets & scale up your annotation team with our new tutorial, guided by our work on the Dolly Dataset by Databricks #AI #LLMs #Opensource
Tweet media one
0
9
26
@argilla_io
Argilla
3 years
🎉 New release 0.6.0 Tons of #UX improvements Early support for weak supervision with any method (Snorkel, Flyingsquid) Metrics for robust NLP Kudos @_Enamya , @abdrahman_issam & our new contributors #python #opensource #NLProc #datacentricAI
Tweet media one
0
9
24