
HPLT
@hplt_eu
Followers
267
Following
60
Media
17
Statuses
63
Horizon Europe - High Performance Language Technology (HPLT)
Joined June 2022
It's happening now. Our HPLT v2 dataset language coverage is awesome, provides competitive and stable results and complements other data beautifully. We are at @aclmeeting, come and say hi! #hplt #datasets
0
5
10
Great use of HPLT v2 datasets! Eager to hear more about #HPLT? Join us at @aclmeeting:. - BoF "Multilingualism: from data crawling to evaluation" on July 29, 16:00. - Poster "An Expanded Massive Multilingual Dataset for High-Performance Language Technologies" on July 30, 11:00.
📢 First release: 38 monolingual reference LLMs (2.15B params) via @hplt_eu + #OpenEuroLLM. ⚙️Trained on 100B tokens from HPLT v2 dataset.🌍 Cover EU langs + others.⚙️ Based on LLaMA, trained on #LUMI.📈 Useful for evaluation. Downloads + more info at
0
4
8
HPLT stopped by @MTSummit2025 last week. We exchanged info with participants at a crowded poster session about HPLT v2 datasets while v3 is still in the oven. Next stop, @aclmeeting!
0
2
11
We are speaking about HPLT datasets and HPTL Analytics as a way to inspect them for quantitative analysis at #LI2024 today. We are introducing samples as a new feature! .If you 😍 or 🤮 our dataset, this is now the time to tell us! If you want us to take a look to yours, DM us.
0
5
13
RT @HajicJan: Just finished: very interesting talk about Open Source and LLMs by @percyliang from @Stanford at @emnlpmeeting : what we can….
0
4
0
RT @shaoxiongji: in collaboration with @androneil54 @lpq29743 @CisLmu @TiedemannJoerg and others I cannot @, and funded by @hplt_eu @UTTERP….
0
3
0
🚀 INTRODUCING THE LATEST HPLT MONOLINGUAL DATASETS! TL;DR:.🔍 4.5 PB of web crawls.📄 21 billion documents.💝 careful extraction, dedup, annotation and cleaning.💥 193 languages!. Explore and download the new HPLT Monolingual Datasets NOW! . #HPLT.
hplt-project.org
A space that combines petabytes of natural language data with large-scale model training
2
15
37
RT @OnadeGibert: Today @josephnlp and I presented KD4MT in the MT Marathon in Prague organized by @ufal_cuni, one of my favourite events of….
0
4
0
RT @ufal_cuni: The MT Marathon continues on its third day! We already had great talks by @OndrejBojar, @prajdabre1, @zouharvi, and @esalesk….
0
9
0
RT @ufal_cuni: 📢 Job offer: Work with us! 🤓 @ufal_cuni @matfyz is looking for 🖥️ a Front-End and ⌨️ a Back-End Java developer to work on….
0
5
0
HPLT at @EAMT_2024! @bazril and @pinzhen_chen from @EdinburghNLP got many questions about when and how much new HPLT data. Quick reply: autumn and a lot.
0
3
19
RT @OnadeGibert: Still recovering from the excitement of #lreccoling2024, where we presented the @hplt_eu resources! We introduce:.- monoHP….
0
2
0
Our project, HPLT, in a nutshell. Thanks @LTInnovate for spreading our voice: .
linkedin.com
by Professor Jan Hajič, Charles University, Prague What is HPLT about? The HPLT (High-Performance Language Technology) project aims at transforming the current language research and innovation...
0
7
21
RT @siliconvikings: Europe’s largest private #AI lab #helyes @silo_AI, #tribetampere @UniTurku’s research group @TurkuNLP + @hplt_eu releas….
tech.eu
Viking 7B is a significant milestone on the journey towards a state-of-the-art LLM family for all European languages.
0
2
0
RT @OnadeGibert: @LrecColing has arrived! We will presenting our work on how we built the HPLT datasets! .📅 Friday 24th of May.⏰ 9.20h-9.40….
0
5
0