
Avi Caciularu
@clu_avi
Followers
541
Following
1K
Media
13
Statuses
270
Research Scientist @GoogleAI | previously ML & NLP PhD student @biunlp, intern at @allen_ai, @Microsoft, @AIatMeta.
Joined July 2009
🚨 New Paper 🚨.Are current LLMs up to the task of solving *complex* instructions based on content-rich text?.Our new dataset, TACT, sheds some light on this challenge. How does it work?.Work by @GoogleAI & @GoogleDeepMind.👇🧵
2
41
105
RT @pybeebee: I will be presenting our work 𝗠𝗗𝗖𝘂𝗿𝗲 at #ACL2025NLP in Vienna this week! 🇦🇹. Come by if you’re interested in multi-doc reason….
aclanthology.org
Gabrielle Kaili-May Liu, Bowen Shi, Avi Caciularu, Idan Szpektor, Arman Cohan. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025.
0
4
0
RT @natolambert: This new benchmark created by @valentina__py should be the new default replacing IFEval. Some of the best frontier models….
0
21
0
RT @armancohan: Excited for the release of SciArena with @allen_ai!. LLMs are now an integral part of research workflows, and SciArena help….
0
10
0
RT @sundarpichai: Gemini 2.5 Pro + 2.5 Flash are now stable and generally available. Plus, get a preview of Gemini 2.5 Flash-Lite, our fast….
0
463
0
RT @ArieCattan: 🚨 RAG is a popular approach but what happens when the retrieved sources provide conflicting information?🤔. We're excited to….
0
14
0
RT @hirscheran: 🚨 Introducing LAQuer, accepted to #ACL2025 (main conf)!. LAQuer provides more granular attribution for LLM generations: use….
0
31
0
RT @goldshtn: Today we published FACTS Grounding, a benchmark and leaderboard for evaluating the factuality of LLMs when grounding to the i….
deepmind.google
Our comprehensive benchmark and online leaderboard offer a much-needed measure of how accurately LLMs ground their responses in provided source material and avoid hallucinations
0
8
0
RT @YonatanBitton: 🚨 Happening NOW at #NeurIPS2024 with @nitzanguetta !.🎭 #VisualRiddles: A Commonsense and World Knowledge Challenge for V….
0
8
0
RT @lmarena_ai: Massive News from Chatbot Arena🔥. @GoogleDeepMind's latest Gemini (Exp 1114), tested with 6K+ community votes over the past….
0
308
0
RT @RoyiRassin: How diverse are the outputs of text-to-image models and how can we measure that? In our new work, we propose a measure base….
0
32
0