
Anar
@anaryegen
Followers
31
Following
1K
Media
1
Statuses
99
PhDing NLP at @Hitz_zentroa | LLMs, Factuality, Arguments, Evals, Multilinguality and more
Joined August 2022
RT @omarsar0: The Illusion of Progress. It's well known that there are caveats with benchmarks and metrics that measure LLM capabilities.….
0
47
0
RT @Nouamanetazi: 🚀 Expert Parallelism now in 🤗 Transformers! Load the 120B gpt-oss model in under 𝟑 𝐬𝐞𝐜𝐨𝐧𝐝𝐬. Proud to have added Expert P….
0
28
0
RT @gneubig: Summary of GPT-OSS architectural innovations:. 1. sliding window attention (ref: .2. mixture of expert….
arxiv.org
Deploying Large Language Models (LLMs) in streaming applications such as multi-round dialogue, where long interactions are expected, is urgently needed but poses two major challenges. Firstly,...
0
360
0
RT @ragerri: I am pleased to participate in 6 papers (2 main and 2 findings, 1 @conll_conf and 1 argument mining shared task overview) to….
0
5
0
RT @ragerri: 1. @anaryegen , @jiporanm , and Rodrigo Agerri (2025). Dynamic Knowledge Integration for Evidence-Driven Counter-Argument Gene….
0
1
0
RT @vijaytarian: RL with verifiable rewards? Works great ✨.Realistic or non-verifiable tasks? Still a mess 📉.Reward models and AI judges? F….
arxiv.org
Language models must be adapted to understand and follow user instructions. Reinforcement learning is widely used to facilitate this -- typically using fixed criteria such as "helpfulness" and...
0
40
0
RT @allen_ai: In our new paper, “Contextualized Evaluations: Judging Language Model Responses to Underspecified Queries,” we find that addi….
0
28
0
RT @jen_hsia: 1/6 Retrieval is supposed to improve generation in RAG systems. But in practice, adding more documents can hurt performance,….
0
21
0
RT @osanseviero: Introducing T5Gemma: the next generation of encoder-decoder/T5 models!. 🔧Decoder models adapted to be encoder-decoder.🔥32….
0
140
0
RT @ShashwatGoel7: There's been a hole at the heart of #LLM evals, and we can now fix it. 📜New paper: Answer Matching Outperforms Multiple….
0
39
0
RT @Hitz_zentroa: In last week’s seminar session @anaryegen talked about Mining Argument Structures in Medical Texts (.
0
3
0
RT @I_MirandaM: #newHitzPaper.Can a simple inference-time approach unlock better Vision-Language Compositionality?🤯.Our latest paper shows….
0
6
0
RT @yueqi_song: Our VisualPuzzles🧩benchmark shows similar findings as "The Illusion of Thinking":.- More tokens ≠ better reasoning .- Reaso….
0
22
0
RT @osainz59: Do you know that you can continue pretraining Instructed LLMs without losing their instruction following capabilities?. We di….
0
6
0
RT @BenShi34: As we optimize model reasoning over verifiable objectives, how does this affect human understanding of said reasoning to achi….
0
40
0
RT @ahsalem511: 📢 #acl2025 - main:.🤔Continued pretraining of LLMs in new languages often includes English data, but why?.💡We found English….
0
6
0
RT @caiqizh: 🔥 We teach LLMs to say how confident they are on-the-fly during long-form generation. 🤩No sampling. No slow post-hoc methods.….
0
22
0
RT @LysandreJik: The Transformers library is undergoing it's largest pivot to date 🙌. It now cements its role as the central model definiti….
0
59
0
RT @AndrewLampinen: How do language models generalize from information they learn in-context vs. via finetuning? We show that in-context le….
0
153
0