
Shmulik Amar
@pyshmulik
Followers
30
Following
1K
Media
8
Statuses
174
RT @ENachshoni: šØ New paper out! š.What happens when LLMs & RLMs face conflicting answers to a question? š¤.They often ignore disagreement aā¦.
0
7
0
RT @mosh_levy: Producing reasoning texts boosts the capabilities of AI models, but do we humans correctly understand these texts? Our latesā¦.
0
27
0
RT @AviyaMaimon: šØ New paper alert! šØ.We propose an IQ Test for LLMs ā a new way to evaluate models that goes beyond benchmarks and uncoverā¦.
0
14
0
RT @AviyaMaimon: We release:.ā
Code.ā
Leaderboard.ā
Skill matrices & tools. Letās shift to skillābased evaluation for LLMs!.Full paper hereā¦.
arxiv.org
Current evaluations of large language models (LLMs) rely on benchmark scores, but it is difficult to interpret what these individual scores reveal about a model's overall skills. Specifically, as...
0
3
0
@biunlp Check out the paper, demo and code for more details. Collab w/ @obspp18 @lovodkin93 Ido Dagan @biunlp. Paper: # @huggingface: Demo: .Code:
github.com
InstructionāGuided Content Selection with LLMs - toolkit and datasets - shmuelamar/igcs
0
1
4
@biunlp We invite researchers to use our benchmark (IGCS-Bench), our generic transfer-learning dataset (GenCS) and our trained SLMs to advance LLM capabilities in extractive content selection!. (5/n).
0
0
1
@biunlp Key findingāÆ2ļøā£: For tasks requiring longer selections, LLMs consistently perform better when processing one document at a time instead of the entire set at once. This is not so much the case for tasks with short selections. (4/n)
1
0
1
@biunlp Key finding 1ļøā£: Training with a diverse mix of content selection tasks helps boost LLM performance even on new extractive tasks. Generic transfer learning at its best!. (3/n)
1
0
1
@biunlp Motivation: Many NLP tasks require selecting relevant text spans from given source texts. Despite this shared objective, such content selection tasks have traditionally been studied in isolation, each with its own modeling approaches, datasets, and evaluations. (2/n)
1
0
1
RT @ArieCattan: šØ RAG is a popular approach but what happens when the retrieved sources provide conflicting information?š¤. We're excited toā¦.
0
14
0
RT @oriern1: š§µ New paper at Findings #ACL2025 @aclmeeting!.Not all documents are processed equally well. Some consistently yield poor resulā¦.
0
12
0
RT @lovodkin93: Check out our new paper on highly localized attributions, both in the input and the output!.
0
7
0
RT @hirscheran: šØ Introducing LAQuer, accepted to #ACL2025 (main conf)!. LAQuer provides more granular attribution for LLM generations: useā¦.
0
32
0
RT @AlonEirew: Excited to present our system demonstration paper on EventFull ā an Event-Event Relation annotation tool ā at #NAACL25. Comeā¦.
0
6
0
RT @_akhaliq: RefVNLI. Towards Scalable Evaluation of Subject-driven Text-to-image Generation
0
52
0
RT @ShirAshuryTahan: LLMs struggle with tablesābut how robust are they really?.š ToRR goes beyond accuracy, testing real-world robustness aā¦.
0
13
0