Prompsit @Prompsit X Profile

Prompsit

@Prompsit

Followers

591

Following

311

Media

538

Statuses

3K

We speak Natural Language Processing, Data Analysis and Artificial Intelligence, among many other languages!

https://t.co/5m7Qypr1jC

Joined June 2011

Don't wanna be here? Send us removal request.

HPLT

@hplt_eu

25 days

Describing HPLT datasets in depth is an essential part of our commitment as data curators: 🆕HPLT 3.0: Very Large-Scale Multilingual Resources for LLM and MT. Mono- and Bi-lingual Data, Multilingual Evaluation, and Pre-Trained Models: https://t.co/uN2zoSF251 We are on🔥at #HPLT

arxiv.org

We present an ongoing initiative to provide open, very large, high-quality, and richly annotated textual datasets for almost 200 languages. At 30 trillion tokens, this is likely the largest...

0

6

8

HPLT

@hplt_eu

26 days

The #HPLT crowd is at #EMNLP2025!!! If you are around, please visit our booth to discuss: - multilingual datasets 🌏 - dataset insights and stats 📊 - dataset performance 🔝 - efficient MT models ⏱️ - and the future of multilingual LLMs 💡 We don't want to miss U!

0

3

10

Prompsit

@Prompsit

1 month

Gracias #PCUMH por insistir en que contemos lo que hacemos y por estar siempre atentos a nuestros avances y logros. Vuestro apoyo nos da visibilidad y alegrías como esta. ¡Gracias!

Parque Científico UMH

@PcientificoUMH

2 months

📢 El #PCUMH, finalista en los “Disruptores Innovation Awards 2025” de @elespanolcom . 🏆Ha sido seleccionado como "Mejor proyecto impulsado por parques tecnológicos" gracias a la empresa @Prompsit , parte de @OpenEuroLLM . Noticia completa🔽 https://t.co/sXH7wKY1Zs

0

Prompsit

@Prompsit

5 months

Impossible oblidar el dia que vam conèixer a l'Olga Torres, aquell somriure que va fer de MultiTrainMT molt més que un projecte d'èxit quant als resultats: va fer pinya, va fer família. Eixe somriure ens acompanyarà sempre, DEP benvolguda amiga.

MultiTraiNMT

@MultiNmt

6 years

Kick-off meeting at @UABBarcelona of MultiTrainMT "Machine Translation training for multilingual citizens meeting" @EUErasmusPlus project. Feel free to follow/contact us for further info and/or becoming an associate partner. Anyone interested in the topic is most welcome!

0

1

Prompsit

@Prompsit

5 months

We had a great time at @MTSummit2025 presenting work about HPLT v2 multilingual datasets (v3 coming soon!) and ProMut, an improved DYI platform to teach and learn about MT. Great to be there also to celebrate the Award of Honour to our co-founder, CRO and friend Mikel Forcada! 😍

0

2

Prompsit

@Prompsit

9 months

Prompsit will actively participate in OpenEuroLLM by analysing and curating the open data needed to train the foundational LLM. We are also contributing to multilingual LLM evaluation and dissemination of it all!

OpenEuroLLM

@OpenEuroLLM

9 months

Kick-off successfully completed. Go OpenEuroLLM team! https://t.co/XCaoRHehHc

0

1

HPLT

@hplt_eu

9 months

We are happy to announce the second release of HPLT bilingual datasets: - 50 English-centric language pairs = 380M parallel sentences (HPLT) 🤩 - 1,275 non-English-centric language pairs = 16.7B parallel sentences (MultiHPLT) 😮 Available at the HPLT dataset catalogue and OPUS.

0

13

16

Prompsit

@Prompsit

10 months

Fue un gusto participar en esta jornada. Gracias por la invitación @PcientificoUMH, nos gustó mucho compartir la jornada con las compañeras de @Prosperabiotech. ¡Tenemos unas científicas y tecnólogas excepcionales a la vuelta de cada esquina! 👩‍🔬👩‍💻💪🦾

Parque Científico UMH

@PcientificoUMH

10 months

Así ha sido la jornada sobre ciencia y tecnología en femenino organizada por el #ParqueCientífico de la @universidadmh para los estudiantes del @IES Victoria Kent 🧪🧬 Una sesión muy especial, promovida por @APTE y el #PCUMH, que ha contado con distintas charlas y talleres.

0

1

Prompsit

@Prompsit

10 months

Arrancamos febrero con proyecto nuevo en @Prompsit 👋 #openeurollm #multilingual #opensource 👋

OpenEuroLLM

@OpenEuroLLM

10 months

It's time for transparent AI in Europe. It's time for open LLMs as a robust foundation for developing future private and public AI services. It's time for: OPEN = open-source Euro = under EU regulations, representing EU values LLM = LLMs https://t.co/K5MlOVS7DX

0

1

Prompsit

@Prompsit

1 year

Para contaros lo que estamos haciendo en SmartBiC, proyecto liderado por @Linguaserve, nuestro póster de la @EAMT_2024 vale más que mil palabras.

0

2

Slator

@slatornews

2 years

By harnessing web crawls 🕸️ from Internet Archive and CommonCrawl, researchers 🔎 from @EdinburghUni, @helsinkiuni, @UniOslo, @UniTurku, and @Prompsit unveil new #language resources aimed at enhancing language modeling and #MT training. https://t.co/QnYoPuy3hf @OnadeGibert

slator.com

Researchers harness web crawls from Internet Archive and CommonCrawl to release new language resources.

0

3

4

Rik van Noord

@rikvannoord

2 years

Happy to share our latest MaCoCu paper, accepted at #LRECCOLING2024 @LrecColing #NLProc 🎉 We have linguists annotate the data *quality* of 4 well-known monolingual corpora (OSCAR, CC100, mC4 and MaCoCu) across 11 European low-resource languages. Link: https://t.co/Pgc7h6XhYj

1

3

31

Parque Científico UMH

@PcientificoUMH

2 years

➡️ La empresa del #ParqueCientífico de la @UniversidadMH, @Prompsit, colabora en un proyecto europeo sobre tecnologías del lenguaje de alto rendimiento con el objetivo de crear diferentes modelos de lenguaje y traducciones potentes. Noticia completa 📌: https://t.co/eDH9qQnsVi

0

3

HPLT

@hplt_eu

2 years

First datasets, then models! Initial HPLT models (LLMs and MT) are out: https://t.co/2WSLZCOhX7, some still running 🏃 We explain what we are doing in the deliverables section: https://t.co/otZs9gF2Sc Meanwhile, we keep cooking IA peta-data-bytes 🥘, enriching, dashboarding 📊

hplt-project.org

A space that combines petabytes of natural language data with large-scale model training

1

14

31

Prompsit

@Prompsit

2 years

Hoy cumplimos 18 años haciendo lo que más nos gusta en este cruce entre lenguas y tecnología. Gracias por vuestra confianza. Per molts anys Prompsit! Gràcies de tot cor pel vostre suport! Happy birthday to us! 🥳 Thanks for your trust, we'll keep doing our best!

0

1

3

HPLT

@hplt_eu

2 years

We just published version 1.2 of HPLT datasets. What's new? - we fixed a bug in monolingual dedup, please redownload! 🛠️ - we filtered out very ugly monolingual documents🤮 - we anonymised the bilingual datasets🕵️‍♀️ https://t.co/vvJSbswjZR

hplt-project.org

A space that combines petabytes of natural language data with large-scale model training

0

4

12

Prompsit

@Prompsit

2 years

Select, filter, visualize your data (OpusCleaner). Then schedule and train MT and LLMs consistently (OpusTrainer) with them. As part of the HPLT project, we build tools to make it easy. They are open-source and we encourage you to use them. More:

0

1

Clarin.si

@ClarinSlovenia

2 years

We are excited to share with you that we now provide 4 more massive monolingual corpora for under-resourced languages: you can access Icelandic, Ukrainian, Catalan and Greek #MaCoCu web corpora for free from the https://t.co/X31izGUnNy repository 😃

1

18

35

Taja Kuzman Pungeršek

@TajaKuzman

3 years

#MaCoCu crew is in Groningen these days! Walking towards great results of MaCoCu corpora evaluation and new MaCoCu language models for under-resourced languages 😁

0

2

13

HPLT

@hplt_eu

3 years

Next June, 17th-25th, the #HPLT consortium will held a #hackathon around a set of topics related to corpora curation in Prague. Interested? Drop us a line and join! https://t.co/jyhLKKdsTQ

hplt-project.org

A space that combines petabytes of natural language data with large-scale model training

0

3

7