Raphaël Merx
@RaphaelMerx
Followers
547
Following
4K
Media
65
Statuses
624
PhD @UniMelb | Tech Lead @CatalpaDev Based in 🇮🇩, worked in🇹🇱 🇵🇬 AI, languages, governance, in no particular (world) order
Melbourne, Australia
Joined January 2012
Our #WMT2025 paper got accepted 🙌 We release a dataset for health low-res MT... and find that Gemini 2.5 beats NLLB 54B, if given full document context. Seeya in Suzhou!
2
1
3
Did you know? ❌77% of language models on @huggingface are not tagged for any language 📈For 95% of languages, most models are multilingual 🚨88% of models with tags are trained on English In a new blog post, @tylerachang and I dig into these trends and why they matter! 👇
2
4
24
paper: https://t.co/e8mIA9e5EC demo: https://t.co/jlYphZa9h6 Grateful for all the help and advice from my supervisors Hanna Suominen, Nick Thieberger, Trevor Cohn, Katerina Vylomova, and special thanks to Maluk Timor & Lois Hong for the great joint work!
0
0
1
in Vienna for ACL, presenting Tulun, a system for low-resource in-domain translation, using LLMs Working w 2 real use cases: medical translation into Tetun 🇹🇱 & disaster relief speech translation in Bislama 🇻🇺
✨TULUN: Transparent and Adaptable Low-resource Machine Translation By: Raphael Merx, Hanna Suominen, Lois Yinghui Hong, Nick Thieberger, Trevor Cohn, Kat Vylomova Paper: https://t.co/Dn499KkhIO Demo: https://t.co/y4Hr7397r3
#ACL2025NLP #NLProc #ACL2025
1
2
3
Our paper on generating bilingual example sentences with LLMs got best paper award @altanlp ! https://t.co/pAULq5bv5f We work with French / Indonesian / Tetun, find that annotators don't agree about what's a "good example", but that LLMs can align with a specific annotator.
2
4
19
Life update: after 10 years in industry, I'm going back to school for a PhD at Uni. of Melbourne! Started last week, lots of work to do which i'm really looking forward to !
✨A very warm welcome to @RaphaelMerx who is joining #UniMelb #NLProc group to work on enhancing Machine Translation for Medical Education in Timor-Leste! Recently Rapha has presented his first paper on MT for the Mambai Language: https://t.co/XbEtaLb8bw !
11
2
23
Coming to Google Translate: Tetun, Tok Pisin, Balinese, Fijian, Acehnese, and many other languages of SEA & the Pacific! Let's test quality when they release, but potentially a small revolution in MT for our region https://t.co/sp8JMFZ95E
blog.google
Google Translate adds 110 new languages using AI, breaking down communication barriers for millions around the world.
5
14
45
Also, the #EURALI folks are very cool and i'm looking forward to more work with them! https://t.co/eW1hKMs7SQ
#nlproc #lreccoling2024
Thanks all for attending #EURALI today! Let’s hope that such research communities along with language enthusiasts and linguists change the landscape of #nlproc for under-resourced languages in the near future! 🙂 #lreccoling2024
1
1
3
Anyways, goes to show the importance of working with native speakers for low resource NLP work, especially in the LLM era, when benchmarks are less trustworthy than ever!
1
1
1
MTOB is still interesting to evaluate LLMs language logic ability, only we need to be careful about the conclusions wrt language acquisition.
1
0
0
Their paper is frankly better, but our findings leave me wondering what results they'd get if producing a separate test set from a native Kalamang speaker.
1
0
0
In particular, our experiment is similar to Tanzer et al, working with Kalamang from Papua, and introducing the MTOB (Machine Translation from One Book) benchmark :
1
0
1
It works when using a test set from the same material as source, and fails miserably when using a totally different test set, which we created with the help of a native Mambai speaker.
1
0
0
First paper published! We create a first corpus for the Mambai language (from Timor-Leste), and try teach an LLM to translate into Mambai using examples selected to match the source, all from one language manual. https://t.co/jE6ReT052P
arxiv.org
This study explores the use of large language models (LLMs) for translating English into Mambai, a low-resource Austronesian language spoken in Timor-Leste, with approximately 200,000 native...
4
5
16
High internet use but low social media use, hats off to 🇩🇪! Survey by @pewresearch
https://t.co/0ckNpeEiJs
0
1
4
This #InternationalWomensDay we pay tribute to the wonder women of Catalpa, who are not just smart and skilled but also fearless, funny and feisty! We spoke to 7 women about their jobs, their career paths and their wishes for women and girls on this day. https://t.co/zW6QYOf5zY
1
1
4
Woke up watching this series of short videos on Papuan languages, very nice: https://t.co/GpkxaMZ1BD Lots of trivia, like the Bukiyip system having two counting systems, one in base 3 (for coconuts and fish), one in base 4 (for betel nuts and bananas)
0
0
3