anaryegen Profile Banner
Anar Profile
Anar

@anaryegen

Followers
31
Following
1K
Media
1
Statuses
99

PhDing NLP at @Hitz_zentroa | LLMs, Factuality, Arguments, Evals, Multilinguality and more

Joined August 2022
Don't wanna be here? Send us removal request.
@anaryegen
Anar
6 days
RT @omarsar0: The Illusion of Progress. It's well known that there are caveats with benchmarks and metrics that measure LLM capabilities.….
0
47
0
@anaryegen
Anar
13 days
RT @Nouamanetazi: 🚀 Expert Parallelism now in 🤗 Transformers! Load the 120B gpt-oss model in under 𝟑 𝐬𝐞𝐜𝐨𝐧𝐝𝐬. Proud to have added Expert P….
0
28
0
@anaryegen
Anar
20 days
RT @ragerri: I am pleased to participate in 6 papers (2 main and 2 findings, 1 @conll_conf and 1 argument mining shared task overview) to….
0
5
0
@anaryegen
Anar
20 days
RT @Hitz_zentroa: HiTZ at #ACL2025NLP in Vienna!
Tweet media one
0
9
0
@anaryegen
Anar
22 days
RT @ragerri: 1. @anaryegen , @jiporanm , and Rodrigo Agerri (2025). Dynamic Knowledge Integration for Evidence-Driven Counter-Argument Gene….
0
1
0
@anaryegen
Anar
24 days
RT @vijaytarian: RL with verifiable rewards? Works great ✨.Realistic or non-verifiable tasks? Still a mess 📉.Reward models and AI judges? F….
Tweet card summary image
arxiv.org
Language models must be adapted to understand and follow user instructions. Reinforcement learning is widely used to facilitate this -- typically using fixed criteria such as "helpfulness" and...
0
40
0
@anaryegen
Anar
28 days
RT @allen_ai: In our new paper, “Contextualized Evaluations: Judging Language Model Responses to Underspecified Queries,” we find that addi….
0
28
0
@anaryegen
Anar
1 month
RT @jen_hsia: 1/6 Retrieval is supposed to improve generation in RAG systems. But in practice, adding more documents can hurt performance,….
0
21
0
@anaryegen
Anar
1 month
RT @osanseviero: Introducing T5Gemma: the next generation of encoder-decoder/T5 models!. 🔧Decoder models adapted to be encoder-decoder.🔥32….
0
140
0
@anaryegen
Anar
2 months
RT @ShashwatGoel7: There's been a hole at the heart of #LLM evals, and we can now fix it. 📜New paper: Answer Matching Outperforms Multiple….
0
39
0
@anaryegen
Anar
2 months
RT @Hitz_zentroa: In last week’s seminar session @anaryegen talked about Mining Argument Structures in Medical Texts (.
0
3
0
@anaryegen
Anar
2 months
RT @I_MirandaM: #newHitzPaper.Can a simple inference-time approach unlock better Vision-Language Compositionality?🤯.Our latest paper shows….
0
6
0
@anaryegen
Anar
2 months
RT @yueqi_song: Our VisualPuzzles🧩benchmark shows similar findings as "The Illusion of Thinking":.- More tokens ≠ better reasoning .- Reaso….
0
22
0
@anaryegen
Anar
2 months
RT @osainz59: Do you know that you can continue pretraining Instructed LLMs without losing their instruction following capabilities?. We di….
0
6
0
@anaryegen
Anar
2 months
RT @BenShi34: As we optimize model reasoning over verifiable objectives, how does this affect human understanding of said reasoning to achi….
0
40
0
@anaryegen
Anar
2 months
RT @ahsalem511: 📢 #acl2025 - main:.🤔Continued pretraining of LLMs in new languages often includes English data, but why?.💡We found English….
0
6
0
@anaryegen
Anar
3 months
RT @caiqizh: 🔥 We teach LLMs to say how confident they are on-the-fly during long-form generation. 🤩No sampling. No slow post-hoc methods.….
0
22
0
@anaryegen
Anar
3 months
RT @LysandreJik: The Transformers library is undergoing it's largest pivot to date 🙌. It now cements its role as the central model definiti….
0
59
0
@anaryegen
Anar
4 months
RT @AndrewLampinen: How do language models generalize from information they learn in-context vs. via finetuning? We show that in-context le….
0
153
0