hplt_eu Profile Banner
HPLT Profile
HPLT

@hplt_eu

Followers
267
Following
60
Media
17
Statuses
63

Horizon Europe - High Performance Language Technology (HPLT)

Joined June 2022
Don't wanna be here? Send us removal request.
@hplt_eu
HPLT
27 days
It's happening now. Our HPLT v2 dataset language coverage is awesome, provides competitive and stable results and complements other data beautifully. We are at @aclmeeting, come and say hi! #hplt #datasets
Tweet media one
Tweet media two
0
5
10
@hplt_eu
HPLT
1 month
Great use of HPLT v2 datasets! Eager to hear more about #HPLT? Join us at @aclmeeting:. - BoF "Multilingualism: from data crawling to evaluation" on July 29, 16:00. - Poster "An Expanded Massive Multilingual Dataset for High-Performance Language Technologies" on July 30, 11:00.
@OpenEuroLLM
OpenEuroLLM
1 month
📢 First release: 38 monolingual reference LLMs (2.15B params) via @hplt_eu + #OpenEuroLLM. ⚙️Trained on 100B tokens from HPLT v2 dataset.🌍 Cover EU langs + others.⚙️ Based on LLaMA, trained on #LUMI.📈 Useful for evaluation. Downloads + more info at
Tweet media one
0
4
8
@grok
Grok
6 days
What do you want to know?.
543
334
2K
@hplt_eu
HPLT
2 months
HPLT stopped by @MTSummit2025 last week. We exchanged info with participants at a crowded poster session about HPLT v2 datasets while v3 is still in the oven. Next stop, @aclmeeting!
Tweet media one
0
2
11
@hplt_eu
HPLT
4 months
HPLT v2 datasets now enriched with register labels from @UniTurku. As Amanda Myntti and Veronika Laippala's show: "Appropriate metadata increases the value of a dataset". - Blog post: - Datasets + register labels (to be merged):
Tweet media one
1
1
4
@hplt_eu
HPLT
5 months
New paper on the HPLT v2 dataset making-of: . - pipeline documentation and code.- extensive analysis of the quality and characteristics.- evaluation of the performance of language models and machine translation systems trained on it. 🤓Happy reading!
Tweet media one
0
5
13
@hplt_eu
HPLT
6 months
We are happy to announce the second release of HPLT bilingual datasets:. - 50 English-centric language pairs = 380M parallel sentences (HPLT) 🤩.- 1,275 non-English-centric language pairs = 16.7B parallel sentences (MultiHPLT) 😮. Available at the HPLT dataset catalogue and OPUS.
0
13
16
@hplt_eu
HPLT
8 months
🥳 Amazing performance of the #HPLT v2 dataset! . HuggingFace multilingual evaluation + HPLT English internal evaluation show that HPLT v2 is one of the best datasets to train LLMs. Downloads and more at either HPLT ➡�� or HF ➡️
Tweet media one
0
7
14
@hplt_eu
HPLT
9 months
Join us on a new edition of the Winter School! . "Pretraining Data Quality 🧐 and Multilingual Evaluation of LLMs👀" . 🪂Feb. 3–5, 2025, Norway. More info and registration: Jointly organised by @hplt_eu and the Nordic Language Processing Laboratory (NLPL)
Tweet media one
0
5
12
@hplt_eu
HPLT
9 months
We are speaking about HPLT datasets and HPTL Analytics as a way to inspect them for quantitative analysis at #LI2024 today. We are introducing samples as a new feature! .If you 😍 or 🤮 our dataset, this is now the time to tell us! If you want us to take a look to yours, DM us.
Tweet media one
0
5
13
@hplt_eu
HPLT
10 months
RT @HajicJan: Just finished: very interesting talk about Open Source and LLMs by @percyliang from @Stanford at @emnlpmeeting : what we can….
0
4
0
@hplt_eu
HPLT
11 months
RT @shaoxiongji: in collaboration with @androneil54 @lpq29743 @CisLmu @TiedemannJoerg and others I cannot @, and funded by @hplt_eu @UTTERP….
0
3
0
@hplt_eu
HPLT
11 months
🚀 INTRODUCING THE LATEST HPLT MONOLINGUAL DATASETS! TL;DR:.🔍 4.5 PB of web crawls.📄 21 billion documents.💝 careful extraction, dedup, annotation and cleaning.💥 193 languages!. Explore and download the new HPLT Monolingual Datasets NOW! . #HPLT.
hplt-project.org
A space that combines petabytes of natural language data with large-scale model training
2
15
37
@hplt_eu
HPLT
11 months
RT @OnadeGibert: Today @josephnlp and I presented KD4MT in the MT Marathon in Prague organized by @ufal_cuni, one of my favourite events of….
0
4
0
@hplt_eu
HPLT
1 year
RT @ufal_cuni: The MT Marathon continues on its third day! We already had great talks by @OndrejBojar, @prajdabre1, @zouharvi, and @esalesk….
0
9
0
@hplt_eu
HPLT
1 year
RT @ufal_cuni: 📢 Job offer: Work with us! 🤓 @ufal_cuni @matfyz is looking for 🖥️ a Front-End and ⌨️ a Back-End Java developer to work on….
0
5
0
@hplt_eu
HPLT
1 year
HPLT at @EAMT_2024! @bazril and @pinzhen_chen from @EdinburghNLP got many questions about when and how much new HPLT data. Quick reply: autumn and a lot.
Tweet media one
0
3
19
@hplt_eu
HPLT
1 year
RT @OnadeGibert: Still recovering from the excitement of #lreccoling2024, where we presented the @hplt_eu resources! We introduce:.- monoHP….
0
2
0
@hplt_eu
HPLT
1 year
RT @OnadeGibert: @LrecColing has arrived! We will presenting our work on how we built the HPLT datasets! .📅 Friday 24th of May.⏰ 9.20h-9.40….
0
5
0