Avi Caciularu @clu_avi X Profile

Avi Caciularu

@clu_avi

Followers

541

Following

1K

Media

13

Statuses

270

Research Scientist @GoogleAI | previously ML & NLP PhD student @biunlp, intern at @allen_ai, @Microsoft, @AIatMeta.

Joined July 2009

Don't wanna be here? Send us removal request.

Avi Caciularu

@clu_avi

1 year

🚨 New Paper 🚨.Are current LLMs up to the task of solving *complex* instructions based on content-rich text?.Our new dataset, TACT, sheds some light on this challenge. How does it work?.Work by @GoogleAI & @GoogleDeepMind.👇🧵

2

41

105

Avi Caciularu

@clu_avi

1 day

RT @pybeebee: I will be presenting our work 𝗠𝗗𝗖𝘂𝗿𝗲 at #ACL2025NLP in Vienna this week! 🇦🇹. Come by if you’re interested in multi-doc reason….

aclanthology.org

Gabrielle Kaili-May Liu, Bowen Shi, Avi Caciularu, Idan Szpektor, Arman Cohan. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025.

0

4

0

Avi Caciularu

@clu_avi

25 days

RT @natolambert: This new benchmark created by @valentina__py should be the new default replacing IFEval. Some of the best frontier models….

0

21

0

Avi Caciularu

@clu_avi

26 days

RT @armancohan: Excited for the release of SciArena with @allen_ai!. LLMs are now an integral part of research workflows, and SciArena help….

0

10

0

Avi Caciularu

@clu_avi

1 month

RT @sundarpichai: Gemini 2.5 Pro + 2.5 Flash are now stable and generally available. Plus, get a preview of Gemini 2.5 Flash-Lite, our fast….

0

463

0

Avi Caciularu

@clu_avi

2 months

RT @ArieCattan: 🚨 RAG is a popular approach but what happens when the retrieved sources provide conflicting information?🤔. We're excited to….

0

14

0

Avi Caciularu

@clu_avi

2 months

RT @pybeebee: 🔥 Excited to share MetaFaith: Understanding and Improving Faithful Natural Language Uncertainty Expression in LLMs🔥. How can….

0

4

0

Avi Caciularu

@clu_avi

2 months

RT @hirscheran: 🚨 Introducing LAQuer, accepted to #ACL2025 (main conf)!. LAQuer provides more granular attribution for LLM generations: use….

0

31

0

Avi Caciularu

@clu_avi

3 months

RT @_akhaliq: RefVNLI. Towards Scalable Evaluation of Subject-driven Text-to-image Generation

0

52

0

Avi Caciularu

@clu_avi

4 months

RT @omerNLP: Wanna check how well a model can share knowledge between languages? Of course you do! 🤩. But can you do it without access to t….

0

14

0

Avi Caciularu

@clu_avi

4 months

RT @OriYoran: New #ICLR2024 paper!. The KoLMogorov Test: can CodeLMs compress data by code generation?. The optimal compression for a seque….

0

47

0

Avi Caciularu

@clu_avi

6 months

RT @megamor2: How can we interpret LLM features at scale? 🤔.Current pipelines use activating inputs, which is costly and ignores how featur….

0

27

0

Avi Caciularu

@clu_avi

7 months

🤔🤔🤔.

Logan Kilpatrick

@OfficialLoganK

7 months

Just when you thought it was over. we’re introducing Gemini 2.0 Flash Thinking, a new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and more 🧵.

0

Avi Caciularu

@clu_avi

7 months

RT @goldshtn: Today we published FACTS Grounding, a benchmark and leaderboard for evaluating the factuality of LLMs when grounding to the i….

deepmind.google

Our comprehensive benchmark and online leaderboard offer a much-needed measure of how accurately LLMs ground their responses in provided source material and avoid hallucinations

0

8

0

Avi Caciularu

@clu_avi

8 months

🥳.

Sundar Pichai

@sundarpichai

8 months

We’re kicking off the start of our Gemini 2.0 era with Gemini 2.0 Flash, which outperforms 1.5 Pro on key benchmarks at 2X speed (see chart below). I’m especially excited to see the fast progress on coding, with more to come. Developers can try an experimental version in AI

0

3

Avi Caciularu

@clu_avi

8 months

RT @YonatanBitton: 🚨 Happening NOW at #NeurIPS2024 with @nitzanguetta !.🎭 #VisualRiddles: A Commonsense and World Knowledge Challenge for V….

0

8

0

Avi Caciularu

@clu_avi

8 months

RT @JeffDean: What a way to celebrate one year of incredible Gemini progress -- #1🥇across the board on overall ranking, as well as on hard….

0

320

0

Avi Caciularu

@clu_avi

8 months

RT @_akhaliq: Google just released gemini-exp-1121. - significant gains on coding performance .- stronger reasoning capabilities .- improv….

0

26

0

Avi Caciularu

@clu_avi

9 months

RT @lmarena_ai: Massive News from Chatbot Arena🔥. @GoogleDeepMind's latest Gemini (Exp 1114), tested with 6K+ community votes over the past….

0

308

0

Avi Caciularu

@clu_avi

9 months

RT @goldshtn: I am hiring a Senior SWE to work on Gemini post-training, improving Gemini factuality. Factuality is a top blocker for LLM ad….

0

4

0

Avi Caciularu

@clu_avi

9 months

RT @RoyiRassin: How diverse are the outputs of text-to-image models and how can we measure that? In our new work, we propose a measure base….

0

32

0