Anar @anaryegen X Profile

Anar

@anaryegen

Followers

31

Following

1K

Media

1

Statuses

99

PhDing NLP at @Hitz_zentroa | LLMs, Factuality, Arguments, Evals, Multilinguality and more

Joined August 2022

Don't wanna be here? Send us removal request.

Anar

@anaryegen

6 days

RT @omarsar0: The Illusion of Progress. It's well known that there are caveats with benchmarks and metrics that measure LLM capabilities.….

0

47

0

Anar

@anaryegen

13 days

RT @Nouamanetazi: 🚀 Expert Parallelism now in 🤗 Transformers! Load the 120B gpt-oss model in under 𝟑 𝐬𝐞𝐜𝐨𝐧𝐝𝐬. Proud to have added Expert P….

0

28

0

Anar

@anaryegen

14 days

RT @gneubig: Summary of GPT-OSS architectural innovations:. 1. sliding window attention (ref: .2. mixture of expert….

arxiv.org

Deploying Large Language Models (LLMs) in streaming applications such as multi-round dialogue, where long interactions are expected, is urgently needed but poses two major challenges. Firstly,...

0

360

0

Anar

@anaryegen

20 days

RT @ragerri: I am pleased to participate in 6 papers (2 main and 2 findings, 1 @conll_conf and 1 argument mining shared task overview) to….

0

5

0

Anar

@anaryegen

20 days

RT @Hitz_zentroa: HiTZ at #ACL2025NLP in Vienna!

0

9

0

Anar

@anaryegen

22 days

RT @ragerri: 1. @anaryegen , @jiporanm , and Rodrigo Agerri (2025). Dynamic Knowledge Integration for Evidence-Driven Counter-Argument Gene….

0

1

0

Anar

@anaryegen

24 days

RT @vijaytarian: RL with verifiable rewards? Works great ✨.Realistic or non-verifiable tasks? Still a mess 📉.Reward models and AI judges? F….

arxiv.org

Language models must be adapted to understand and follow user instructions. Reinforcement learning is widely used to facilitate this -- typically using fixed criteria such as "helpfulness" and...

0

40

0

Anar

@anaryegen

28 days

RT @allen_ai: In our new paper, “Contextualized Evaluations: Judging Language Model Responses to Underspecified Queries,” we find that addi….

0

28

0

Anar

@anaryegen

1 month

RT @jen_hsia: 1/6 Retrieval is supposed to improve generation in RAG systems. But in practice, adding more documents can hurt performance,….

0

21

0

Anar

@anaryegen

1 month

RT @osanseviero: Introducing T5Gemma: the next generation of encoder-decoder/T5 models!. 🔧Decoder models adapted to be encoder-decoder.🔥32….

0

140

0

Anar

@anaryegen

2 months

RT @ShashwatGoel7: There's been a hole at the heart of #LLM evals, and we can now fix it. 📜New paper: Answer Matching Outperforms Multiple….

0

39

0

Anar

@anaryegen

2 months

RT @Hitz_zentroa: In last week’s seminar session @anaryegen talked about Mining Argument Structures in Medical Texts (.

0

3

0

Anar

@anaryegen

2 months

RT @I_MirandaM: #newHitzPaper.Can a simple inference-time approach unlock better Vision-Language Compositionality?🤯.Our latest paper shows….

0

6

0

Anar

@anaryegen

2 months

RT @yueqi_song: Our VisualPuzzles🧩benchmark shows similar findings as "The Illusion of Thinking":.- More tokens ≠ better reasoning .- Reaso….

0

22

0

Anar

@anaryegen

2 months

RT @osainz59: Do you know that you can continue pretraining Instructed LLMs without losing their instruction following capabilities?. We di….

0

6

0

Anar

@anaryegen

2 months

RT @BenShi34: As we optimize model reasoning over verifiable objectives, how does this affect human understanding of said reasoning to achi….

0

40

0

Anar

@anaryegen

2 months

RT @ahsalem511: 📢 #acl2025 - main:.🤔Continued pretraining of LLMs in new languages often includes English data, but why?.💡We found English….

0

6

0

Anar

@anaryegen

3 months

RT @caiqizh: 🔥 We teach LLMs to say how confident they are on-the-fly during long-form generation. 🤩No sampling. No slow post-hoc methods.….

0

22

0

Anar

@anaryegen

3 months

RT @LysandreJik: The Transformers library is undergoing it's largest pivot to date 🙌. It now cements its role as the central model definiti….

0

59

0

Anar

@anaryegen

4 months

RT @AndrewLampinen: How do language models generalize from information they learn in-context vs. via finetuning? We show that in-context le….

0

153

0