ArmelRandy @RandyZebaze X Profile

ArmelRandy

@RandyZebaze

Followers

195

Following

3K

Media

17

Statuses

78

PhD Student @InriaParisNLP | MVA 2022 @ENS_ParisSaclay | X19 @Polytechnique

https://t.co/qfrePPjHib

Joined February 2022

Don't wanna be here? Send us removal request.

ArmelRandy

@RandyZebaze

3 months

🎉 Grateful and happy to share that two of our papers were accepted to #EMNLP2025 Findings! 🚀 [1] Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation [2] TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation

2

1

6

Rohan Paul

@rohanpaul_ai

1 month

The paper tests whether “thinking tokens” help translation and finds they mostly do not. The team tests big reasoning models that write hidden thoughts before the answer. They compare translations with and without those thoughts. Quality barely changes across many language

1

5

28

ArmelRandy

@RandyZebaze

3 months

[1] Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation Arxiv: https://t.co/Ff72J1esro Github: https://t.co/qTFcU3VNgr [2] TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation Arxiv:

arxiv.org

LLMs have been shown to perform well in machine translation (MT) with the use of in-context learning (ICL), rivaling supervised models when translating into high-resource languages (HRLs)....

0

Lydia Nishimwe

@LydiaNishimwe

5 months

🎓 I defended my PhD in Machine Translation last month! Grateful to my colleagues at @inria_paris for the support & collaboration throughout this journey. 🎯 Open to Work - AI/NLP Research Scientist or Engineer roles, starting September 2025, on-site in the Paris area or remote.

0

3

11

Slator

@slatornews

8 months

👉 https://t.co/PLHdEfGuh5 Researchers at @Inria 🇫🇷 demonstrate how to improve #AI #translation for low-resource languages by breaking ⛓️‍💥 sentences into simpler phrases, translating each using in-context examples, and using these pairs to guide translation. #xl8 #t9n #LLMs

0

1

2

Rohan Paul

@rohanpaul_ai

9 months

LLMs struggle with machine translation for low-resource languages, even with similar examples. This paper introduces Compositional Translation (CompTra). It decomposes sentences into phrases, translates each using retrieved examples, and recombines these translations for the

1

3

13

Anthropic

@AnthropicAI

9 months

Introducing Claude 3.7 Sonnet: our most intelligent model to date. It's a hybrid reasoning model, producing near-instant responses or extended, step-by-step thinking. One model, two ways to think. We’re also releasing an agentic coding tool: Claude Code.

1K

3K

19K

Lydia Nishimwe

@LydiaNishimwe

9 months

🚀 Exciting Challenge Ahead! 🚀 I'm thrilled to be one of 12 finalists in the 3-minute thesis competition (@MT180FR ) at Sorbonne Université. 🗓️March 10th, 6PM Paris 🔗Register to watch (in person or online) & vote: https://t.co/loJI8qIixr Looking forward to seeing you there!

sorbonne-universite.fr

12 candidates et candidats vont participer à la finale du concours Ma Thèse en 180 secondes le 10 mars 2025. Voici un aperçu de leurs sujets.

2

3

8

ArmelRandy

@RandyZebaze

10 months

TL;DR Everything is in the title. The paper is available on ArXiv https://t.co/jscBxD3fpP The code and outputs are available on Github https://t.co/AYTCIVgQfR Thanks to my co-authors @bensagot and @RABawden, and to @InriaParisNLP. 10/10

github.com

[NAACL 2025 Findings] Example Selection via Similarity Search improves Low-resource Machine Translation - ArmelRandy/ICL-MT

0

1

ArmelRandy

@RandyZebaze

10 months

Finally, we demonstrate that similarity-based example selection (in a high-quality sample pool) helps few-shot MT with LLMs ranging from 2 to 70 billion parameters. As the number of in-context examples grows, the gap with random selection remains significant. 9/10

1

0

ArmelRandy

@RandyZebaze

10 months

Using FLORES-200 dev set (997 human-written pairs) as our initial selection pool, we study the impact of reducing or expanding it with bitexts from the NLLB dataset. In Swahili, similarity search (notably SONAR) proves more robust to pool composition than random selection. 8/10

1

0

ArmelRandy

@RandyZebaze

10 months

SONAR also outperforms example selection based on string-matching metrics like BLEU, BM25, R(rerank)-BM25, and cosine-similarity with RoBERTa's sentence representations. 7/10

1

0

ArmelRandy

@RandyZebaze

10 months

Experiments with 5 sentence embeddings on 4 FLORES-200 languages show that similarity-based selection outperforms random selection in LRLs but offers only marginal gains in HRLs (French). Across both cases, sentence embeddings perform similarly, with SONAR slightly leading. 6/10

1

0

ArmelRandy

@RandyZebaze

10 months

We tackle these issues by assigning a zero score to problematic generations, making the metrics language-aware. Specifically, we evaluate with Language-aware COMET, based on COMET-22. It preserves COMET's accuracy while improving the assessment of problematic outputs. 5/10

1

0

ArmelRandy

@RandyZebaze

10 months

Translating into low-resource languages presents two main challenges: • Outputs may be in the wrong language (e.g., repeating the prompt). • They may be empty or contain meaningless repetitions. Current neural metrics are not robust to these issues. 4/10

1

0

ArmelRandy

@RandyZebaze

10 months

We examine three aspects: • Evaluating LLM-based MT into LRLs. • Assessing whether similarity-based example selection improves MT, especially with a small pool (typical) for LRLs, and at scale. • Testing the strategy’s robustness to selection pool heterogeneity. 3/10

1

0

ArmelRandy

@RandyZebaze

10 months

We explore in-context example selection for MT, focusing on LRLs (Swahili, Wolof etc. ). Given a sentence and a selection pool, we choose the k closest pairs based on a sentence embedding or a string-matching metric, placing the most similar closest to the sentence. 2/10

1

0

ArmelRandy

@RandyZebaze

10 months

I am happy to announce that our paper "In-context Example Selection via Similarity Search Improves Low-resource Machine Translation" was accepted to the #NAACL2025 Findings 🤩🔥. What is this about? TAGS: Machine Translation (MT), High/Low -resource languages (H/LRLs). 🧵 1/10

1

0

3

kyutai

@kyutai_labs

11 months

Meet Helium-1 preview, our 2B multi-lingual LLM, targeting edge and mobile devices, released under a CC-BY license. Start building with it today! https://t.co/X4Dbx2T1cJ

huggingface.co

10

94

381

Rohan Paul

@rohanpaul_ai

11 months

Tree of Problems (ToP) breaks complex LLM tasks into identical subtasks, solving them like nested Russian dolls Turns massive problems into bite-sized copies. Original Problem 🤔: LLMs struggle with complex reasoning tasks that require breaking down into simpler subtasks.

7

52

260