zmprcp Profile Banner
José Maria Pombal Profile
José Maria Pombal

@zmprcp

Followers
94
Following
198
Media
9
Statuses
64

Senior Research Scientist @swordhealth, PhD student @istecnico.

Lisbon, Portugal
Joined March 2023
Don't wanna be here? Send us removal request.
@zmprcp
José Maria Pombal
29 days
I'll be at COLM today presenting M-Prometheus (morning, Poster 40) and Zero-shot Benchmarking (afternoon, poster 9). Come check it out!
@sardine_lab_it
Sardine Lab
29 days
Don't miss our lab's presentations today at @COLM_conf!! 🔥 We will have two presentations 1/3
0
3
5
@andre_t_martins
Andre Martins
1 month
1) Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models w/ @zmprcp @nunonmg @RicardoRei7 - Poster session 2, Tue Oct 7, 4:30 PM – 6:30 PM
1
1
2
@andre_t_martins
Andre Martins
1 month
2) M-Prometheus: A Suite of Open Multilingual LLM Judges w/ @zmprcp @dongkeun_yoon @psanfernandes @ianwu97 @seungonekim @RicardoRei7 @gneubig - (Poster session 1, Tue Oct 7, 11:00 AM – 1:00 PM)
1
2
7
@zmprcp
José Maria Pombal
3 months
Also, hit me up if you're at ACL and want to chat/meet :)
0
0
0
@zmprcp
José Maria Pombal
3 months
1
0
1
@zmprcp
José Maria Pombal
3 months
I'll be at ACL presenting our work, A Context-aware Framework for Translation-mediated Conversations ( https://t.co/3Y3IM2n3HU) in the Machine Translation session, 28 Jul, 14:00-15:30, room 1.85. Come check it out if you're interested in bilingual chat MT!
1
2
7
@zmprcp
José Maria Pombal
4 months
Last week was my final one at @Unbabel. I'm incredibly proud of our work (e.g., Tower, MINT, M-Prometheus, ZSB). Now, alongside my PhD studies at @istecnico, I'm joining @swordhealth as Senior Research Scientist under @RicardoRei7. Super confident in the team we're assembling.
1
0
12
@ManosZaranis
Manos Zaranis
4 months
🚨Meet MF²: Movie Facts & Fibs: a new benchmark for long-movie understanding! 🤔Do you think your model understands movies? Unlike existing benchmarks, MF² targets memorable events, emotional arcs 💔, and causal chains 🔗 — things humans recall easily, but even top models like
2
30
55
@zmprcp
José Maria Pombal
5 months
Check out the latest iteration of Tower models, Tower+. Ideal for translation tasks and beyond, and available at three different scales: 2B, 9B, 72B. All available on huggingface: https://t.co/XWJqTeht7R Kudos to everyone involved!
Tweet card summary image
huggingface.co
@RicardoRei7
Ricardo Rei
5 months
🚀 Tower+: our latest model in the Tower family — sets a new standard for open-weight multilingual models! We show how to go beyond sentence-level translation, striking a balance between translation quality and general multilingual capabilities. 1/5 https://t.co/WKQapk31c0
0
1
10
@dongkeun_yoon
Dongkeun Yoon
6 months
🙁 LLMs are overconfident even when they are dead wrong. 🧐 What about reasoning models? Can they actually tell us “My answer is only 60% likely to be correct”? ❗Our paper suggests that they can! Through extensive analysis, we investigate what enables this emergent ability.
9
49
302
@psanfernandes
Patrick Fernandes
6 months
MT metrics excel at evaluating sentence translations, but struggle with complex texts We introduce *TREQA* a framework to assess how translations preserve key info by using LLMs to generate & answer questions about them https://t.co/aHUScXzoBM (co-lead @swetaagrawal20) 1/15
2
12
38
@dongkeun_yoon
Dongkeun Yoon
7 months
Introducing M-Prometheus — the latest iteration of the open LLM judge, Prometheus! Specially trained for multilingual evaluation. Excels across diverse settings, including the challenging task of literary translation assessment.
@zmprcp
José Maria Pombal
7 months
We just released M-Prometheus, a suite of strong open multilingual LLM judges at 3B, 7B, and 14B parameters! Check out the models and training data on Huggingface: https://t.co/nqixsAQtQ0 and our paper: https://t.co/c93J4YGXZH
0
3
22
@seungonekim
Seungone Kim
7 months
Here's our new paper on m-Prometheus, a series of multulingual judges! 1/ Effective at safety & translation eval 2/ Also stands out as a good reward model in BoN 3/ Backbone model selection & training on natively multilingual data is important Check out @zmprcp 's post!
@zmprcp
José Maria Pombal
7 months
We just released M-Prometheus, a suite of strong open multilingual LLM judges at 3B, 7B, and 14B parameters! Check out the models and training data on Huggingface: https://t.co/nqixsAQtQ0 and our paper: https://t.co/c93J4YGXZH
0
2
20
@zmprcp
José Maria Pombal
7 months
Models and training data: https://t.co/nqixsAQtQ0 Paper:
0
0
3
@zmprcp
José Maria Pombal
7 months
1
0
5
@zmprcp
José Maria Pombal
7 months
There were a lot of open questions on what strategies work for building multilingual LLM judges. We perform ablations on our training recipe that highlight the importance of backbone model choice and of using natively multilingual—instead of translated—training data.
1
0
3
@zmprcp
José Maria Pombal
7 months
We fine-tune Qwen2.5 models with a recipe inspired by Prometheus 2. We release two multilingual datasets: M-Feedback-Collection, and M-Preference-Collection. They contain DA and PWC data for 5 languages, and MT eval data for 8 LPs. Our models perform well on unseen languages.
1
0
3
@zmprcp
José Maria Pombal
7 months
For their size, M-Prometheus models achieve SotA performance on multilingual reward benchmarks and literary MT evaluation. They can also be used to significantly improve multilingual LLM outputs via best-of-n decoding (QAD)! Very useful for refining synthetic data, for example.
1
0
5
@zmprcp
José Maria Pombal
7 months
We just released M-Prometheus, a suite of strong open multilingual LLM judges at 3B, 7B, and 14B parameters! Check out the models and training data on Huggingface: https://t.co/nqixsAQtQ0 and our paper: https://t.co/c93J4YGXZH
3
18
79
@zmprcp
José Maria Pombal
7 months
Massive kudos to collaborators @nunonmg @RicardoRei7 and @andre_t_martins . To use our benchmarks and create your own, check out our repository:
Tweet card summary image
github.com
Contribute to deep-spin/zsb development by creating an account on GitHub.
0
0
0