Anthony Chen 🤖 @_anthonychen X Profile

Anthony Chen 🤖

@_anthonychen

Followers

462

Following

5K

Media

17

Statuses

186

ai research @googledeepmind ♊️ phd from @ucirvine

little worm in big apple

Joined May 2017

Don't wanna be here? Send us removal request.

Anthony Chen 🤖

@_anthonychen

1 year

Lots of discourse around long-context language models (LCLMs) subsuming RAG and retrieval but how close are we to this paradigm shift?. Introducing LOFT a 1 million token benchmark spanning 6 tasks & 35 datasets to test LCLMs’ ability to do in-context retrieval & reasoning [1/10].

Jinhyuk Lee

@leejnhk

1 year

Can long-context language models (LCLMs) subsume retrieval, RAG, SQL, and more?. Introducing LOFT: a benchmark stress-testing LCLMs on million-token tasks like retrieval, RAG, and SQL. Surprisingly, LCLMs rival specialized models trained for these tasks!.

1

4

22

Anthony Chen 🤖

@_anthonychen

2 months

RT @GeminiApp: Bon appétit 🍝

0

562

0

Anthony Chen 🤖

@_anthonychen

4 months

RT @lmarena_ai: BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆. T….

0

417

0

Anthony Chen 🤖

@_anthonychen

4 months

RT @GoogleDeepMind: Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimen….

0

518

0

Anthony Chen 🤖

@_anthonychen

4 months

Historic.

Computer History Museum

@ComputerHistory

4 months

In partnership with Google, CHM is excited to announce the public release and long-term preservation of the source code for AlexNet, the neural network that revolutionized AI. Learn more: .CHM’s GitHub access to open-source code:

0

1

Anthony Chen 🤖

@_anthonychen

4 months

GDM's work converting Gemini into a SOTA dual encoder is now out! SOTA results across all benchmarks including exceptionally strong coding performance. Check out the tech report for more details and some interesting ablations!.

Jinhyuk Lee

@leejnhk

4 months

🎉 Gemini Embedding is LIVE! 🎉. Try our state-of-the-art text embedding model for FREE on Vertex AI (text-embedding-large-exp-03-07; 120 QPM) & AI Studio (gemini-embedding-exp-03-07)!. ➡️ APIs: ➡️ Report:

0

6

Anthony Chen 🤖

@_anthonychen

6 months

RT @DrJimFan: This is the most gut-wrenching blog I've read, because it's so real and so close to heart. The author is no longer with us. I….

0

401

0

Anthony Chen 🤖

@_anthonychen

1 year

Thanks for sharing our paper!.

elvis

@omarsar0

1 year

Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?. Google DeepMind conducts a deep performance analysis of long-context LLMs on in-context retrieval and reasoning. They first present a benchmark with real-world tasks requiring 1M token context. Report

0

4

Anthony Chen 🤖

@_anthonychen

1 year

RT @ZhuyunDai: Thrilled to unveil LOFT, our latest research showing how long-context language models like Gemini can subsume retrieval, RAG….

0

9

0

Anthony Chen 🤖

@_anthonychen

1 year

RT @YiLuan9: Very happy to contribute to the Multimodal benchmarking on the LOFT project! Very excited to see with fewshot prompting only,….

0

4

0

Anthony Chen 🤖

@_anthonychen

1 year

Thanks for sharing our work!.

Aran Komatsuzaki

@arankomatsuzaki

1 year

Google presents Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?. Long-context LM:.- Often rivals SotA retrieval and RAG systems.- But still struggles with areas like compositional reasoning. repo: abs:

0

1

21

Anthony Chen 🤖

@_anthonychen

1 year

RT @riedelcastro: "just put the corpus into the context"! . Long context models can already match or beat various bespoke pipelines and inf….

0

17

0

Anthony Chen 🤖

@_anthonychen

1 year

RT @kelvin_guu: Do long-context LMs obsolete retrieval, RAG, SQL and more? Excited to share our answer! from the te….

0

12

0

Anthony Chen 🤖

@_anthonychen

1 year

RT @leejnhk: Can long-context language models (LCLMs) subsume retrieval, RAG, SQL, and more?. Introducing LOFT: a benchmark stress-testing….

0

53

0

Anthony Chen 🤖

@_anthonychen

1 year

Work done with a great group of collaborators:.@leejnhk @zhuyundai @ddua17 @devendr06654102 @michaelboratko @yiluan9 @seba1511 @vincentperot @siddalmia05 @hexiang_hu @xudong_lin_ai @icepasupat @amini_aida @jeremy_r_cole @riedelcastro @iftekharnaim @mchang21 @kelvin_guu [10/10].

0

2

Anthony Chen 🤖

@_anthonychen

1 year

That’s it for twitter! Check out our paper “Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?” and download the data in LOFT to see how LCLMs perform for yourself!.Paper: Data: [9/10].

1

0

2

Anthony Chen 🤖

@_anthonychen

1 year

Overall, I’m super excited by long-context models because there are so many areas they can subsume just by prompting over a massive corpora of information and we’re only scratching the surface. It’s such a simple approach to an otherwise complex engineering problem [8/10].

1

0

1

Anthony Chen 🤖

@_anthonychen

1 year

One thing I want to stress is the fact that we found LCLMs were quite sensitive to the prompting format used (see our prompt ablations in our paper for more details). Significant work remains in making models robust to long-context instructions [7/10].

1

0

1

Anthony Chen 🤖

@_anthonychen

1 year

We leverage the SQL queries to see where LCLMs are weak. For each logical operator, we compute avg performance on Q’s w/ the operator in its SQL query. Spoiler: averaging is tough, counting is relatively easy, & reasoning over equality is a breeze compared to inequality [6/10]

1

0

1

Anthony Chen 🤖

@_anthonychen

1 year

To gauge LCLMs' capacity for complex reasoning, we repurpose semantic parsing datasets. We convert databases to CSV, input with a natural language query, then prompt LCLMs to reason purely in natural language. We find a lot of headroom in complex compositional reasoning. [5/10]

1

0

1

Anthony Chen 🤖

@_anthonychen

1 year

On multi-modal retrieval & RAG, LCLMs do shockingly well, sometimes outperforming specialized models. This is amazing given these LCLMs were not tuned for these tasks & the implications are huge: prompting a LCLM is way easier than building a custom RAG pipeline. [4/10]

1

0

1