Avi Caciularu Profile
Avi Caciularu

@clu_avi

Followers
541
Following
1K
Media
13
Statuses
270

Research Scientist @GoogleAI | previously ML & NLP PhD student @biunlp, intern at @allen_ai, @Microsoft, @AIatMeta.

Joined July 2009
Don't wanna be here? Send us removal request.
@clu_avi
Avi Caciularu
1 year
🚨 New Paper 🚨.Are current LLMs up to the task of solving *complex* instructions based on content-rich text?.Our new dataset, TACT, sheds some light on this challenge. How does it work?.Work by @GoogleAI & @GoogleDeepMind.👇🧵
Tweet media one
2
41
105
@clu_avi
Avi Caciularu
1 day
RT @pybeebee: I will be presenting our work 𝗠𝗗𝗖𝘂𝗿𝗲 at #ACL2025NLP in Vienna this week! 🇦🇹. Come by if you’re interested in multi-doc reason….
Tweet card summary image
aclanthology.org
Gabrielle Kaili-May Liu, Bowen Shi, Avi Caciularu, Idan Szpektor, Arman Cohan. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025.
0
4
0
@clu_avi
Avi Caciularu
25 days
RT @natolambert: This new benchmark created by @valentina__py should be the new default replacing IFEval. Some of the best frontier models….
0
21
0
@clu_avi
Avi Caciularu
26 days
RT @armancohan: Excited for the release of SciArena with @allen_ai!. LLMs are now an integral part of research workflows, and SciArena help….
0
10
0
@clu_avi
Avi Caciularu
1 month
RT @sundarpichai: Gemini 2.5 Pro + 2.5 Flash are now stable and generally available. Plus, get a preview of Gemini 2.5 Flash-Lite, our fast….
0
463
0
@clu_avi
Avi Caciularu
2 months
RT @ArieCattan: 🚨 RAG is a popular approach but what happens when the retrieved sources provide conflicting information?🤔. We're excited to….
0
14
0
@clu_avi
Avi Caciularu
2 months
RT @pybeebee: 🔥 Excited to share MetaFaith: Understanding and Improving Faithful Natural Language Uncertainty Expression in LLMs🔥. How can….
0
4
0
@clu_avi
Avi Caciularu
2 months
RT @hirscheran: 🚨 Introducing LAQuer, accepted to #ACL2025 (main conf)!. LAQuer provides more granular attribution for LLM generations: use….
0
31
0
@clu_avi
Avi Caciularu
3 months
RT @_akhaliq: RefVNLI. Towards Scalable Evaluation of Subject-driven Text-to-image Generation
Tweet media one
0
52
0
@clu_avi
Avi Caciularu
4 months
RT @omerNLP: Wanna check how well a model can share knowledge between languages? Of course you do! 🤩. But can you do it without access to t….
0
14
0
@clu_avi
Avi Caciularu
4 months
RT @OriYoran: New #ICLR2024 paper!. The KoLMogorov Test: can CodeLMs compress data by code generation?. The optimal compression for a seque….
0
47
0
@clu_avi
Avi Caciularu
6 months
RT @megamor2: How can we interpret LLM features at scale? 🤔.Current pipelines use activating inputs, which is costly and ignores how featur….
0
27
0
@clu_avi
Avi Caciularu
7 months
🤔🤔🤔.
@OfficialLoganK
Logan Kilpatrick
7 months
Just when you thought it was over. we’re introducing Gemini 2.0 Flash Thinking, a new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and more 🧵.
0
0
0
@clu_avi
Avi Caciularu
7 months
RT @goldshtn: Today we published FACTS Grounding, a benchmark and leaderboard for evaluating the factuality of LLMs when grounding to the i….
Tweet card summary image
deepmind.google
Our comprehensive benchmark and online leaderboard offer a much-needed measure of how accurately LLMs ground their responses in provided source material and avoid hallucinations
0
8
0
@clu_avi
Avi Caciularu
8 months
🥳.
@sundarpichai
Sundar Pichai
8 months
We’re kicking off the start of our Gemini 2.0 era with Gemini 2.0 Flash, which outperforms 1.5 Pro on key benchmarks at 2X speed (see chart below). I’m especially excited to see the fast progress on coding, with more to come. Developers can try an experimental version in AI
Tweet media one
0
0
3
@clu_avi
Avi Caciularu
8 months
RT @YonatanBitton: 🚨 Happening NOW at #NeurIPS2024 with @nitzanguetta !.🎭 #VisualRiddles: A Commonsense and World Knowledge Challenge for V….
0
8
0
@clu_avi
Avi Caciularu
8 months
RT @JeffDean: What a way to celebrate one year of incredible Gemini progress -- #1🥇across the board on overall ranking, as well as on hard….
0
320
0
@clu_avi
Avi Caciularu
8 months
RT @_akhaliq: Google just released gemini-exp-1121. - significant gains on coding performance .- stronger reasoning capabilities .- improv….
0
26
0
@clu_avi
Avi Caciularu
9 months
RT @lmarena_ai: Massive News from Chatbot Arena🔥. @GoogleDeepMind's latest Gemini (Exp 1114), tested with 6K+ community votes over the past….
0
308
0
@clu_avi
Avi Caciularu
9 months
RT @goldshtn: I am hiring a Senior SWE to work on Gemini post-training, improving Gemini factuality. Factuality is a top blocker for LLM ad….
0
4
0
@clu_avi
Avi Caciularu
9 months
RT @RoyiRassin: How diverse are the outputs of text-to-image models and how can we measure that? In our new work, we propose a measure base….
0
32
0