_anthonychen Profile Banner
Anthony Chen 🤖 Profile
Anthony Chen 🤖

@_anthonychen

Followers
462
Following
5K
Media
17
Statuses
186

ai research @googledeepmind ♊️ phd from @ucirvine

little worm in big apple
Joined May 2017
Don't wanna be here? Send us removal request.
@_anthonychen
Anthony Chen 🤖
1 year
Lots of discourse around long-context language models (LCLMs) subsuming RAG and retrieval but how close are we to this paradigm shift?. Introducing LOFT a 1 million token benchmark spanning 6 tasks & 35 datasets to test LCLMs’ ability to do in-context retrieval & reasoning [1/10].
@leejnhk
Jinhyuk Lee
1 year
Can long-context language models (LCLMs) subsume retrieval, RAG, SQL, and more?. Introducing LOFT: a benchmark stress-testing LCLMs on million-token tasks like retrieval, RAG, and SQL. Surprisingly, LCLMs rival specialized models trained for these tasks!.
Tweet media one
1
4
22
@_anthonychen
Anthony Chen 🤖
2 months
RT @GeminiApp: Bon appétit 🍝
0
562
0
@_anthonychen
Anthony Chen 🤖
4 months
RT @lmarena_ai: BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆. T….
0
417
0
@_anthonychen
Anthony Chen 🤖
4 months
RT @GoogleDeepMind: Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimen….
0
518
0
@_anthonychen
Anthony Chen 🤖
4 months
Historic.
@ComputerHistory
Computer History Museum
4 months
In partnership with Google, CHM is excited to announce the public release and long-term preservation of the source code for AlexNet, the neural network that revolutionized AI. Learn more: .CHM’s GitHub access to open-source code:
0
0
1
@_anthonychen
Anthony Chen 🤖
4 months
GDM's work converting Gemini into a SOTA dual encoder is now out! SOTA results across all benchmarks including exceptionally strong coding performance. Check out the tech report for more details and some interesting ablations!.
@leejnhk
Jinhyuk Lee
4 months
🎉 Gemini Embedding is LIVE! 🎉. Try our state-of-the-art text embedding model for FREE on Vertex AI (text-embedding-large-exp-03-07; 120 QPM) & AI Studio (gemini-embedding-exp-03-07)!. ➡️ APIs: ➡️ Report:
0
0
6
@_anthonychen
Anthony Chen 🤖
6 months
RT @DrJimFan: This is the most gut-wrenching blog I've read, because it's so real and so close to heart. The author is no longer with us. I….
0
401
0
@_anthonychen
Anthony Chen 🤖
1 year
Thanks for sharing our paper!.
@omarsar0
elvis
1 year
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?. Google DeepMind conducts a deep performance analysis of long-context LLMs on in-context retrieval and reasoning. They first present a benchmark with real-world tasks requiring 1M token context. Report
Tweet media one
0
0
4
@_anthonychen
Anthony Chen 🤖
1 year
RT @ZhuyunDai: Thrilled to unveil LOFT, our latest research showing how long-context language models like Gemini can subsume retrieval, RAG….
0
9
0
@_anthonychen
Anthony Chen 🤖
1 year
RT @YiLuan9: Very happy to contribute to the Multimodal benchmarking on the LOFT project! Very excited to see with fewshot prompting only,….
0
4
0
@_anthonychen
Anthony Chen 🤖
1 year
Thanks for sharing our work!.
@arankomatsuzaki
Aran Komatsuzaki
1 year
Google presents Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?. Long-context LM:.- Often rivals SotA retrieval and RAG systems.- But still struggles with areas like compositional reasoning. repo: abs:
Tweet media one
0
1
21
@_anthonychen
Anthony Chen 🤖
1 year
RT @riedelcastro: "just put the corpus into the context"! . Long context models can already match or beat various bespoke pipelines and inf….
0
17
0
@_anthonychen
Anthony Chen 🤖
1 year
RT @kelvin_guu: Do long-context LMs obsolete retrieval, RAG, SQL and more? Excited to share our answer! from the te….
0
12
0
@_anthonychen
Anthony Chen 🤖
1 year
RT @leejnhk: Can long-context language models (LCLMs) subsume retrieval, RAG, SQL, and more?. Introducing LOFT: a benchmark stress-testing….
0
53
0
@_anthonychen
Anthony Chen 🤖
1 year
That’s it for twitter! Check out our paper “Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?” and download the data in LOFT to see how LCLMs perform for yourself!.Paper: Data: [9/10].
1
0
2
@_anthonychen
Anthony Chen 🤖
1 year
Overall, I’m super excited by long-context models because there are so many areas they can subsume just by prompting over a massive corpora of information and we’re only scratching the surface. It’s such a simple approach to an otherwise complex engineering problem [8/10].
1
0
1
@_anthonychen
Anthony Chen 🤖
1 year
One thing I want to stress is the fact that we found LCLMs were quite sensitive to the prompting format used (see our prompt ablations in our paper for more details). Significant work remains in making models robust to long-context instructions [7/10].
1
0
1
@_anthonychen
Anthony Chen 🤖
1 year
We leverage the SQL queries to see where LCLMs are weak. For each logical operator, we compute avg performance on Q’s w/ the operator in its SQL query. Spoiler: averaging is tough, counting is relatively easy, & reasoning over equality is a breeze compared to inequality [6/10]
Tweet media one
1
0
1
@_anthonychen
Anthony Chen 🤖
1 year
To gauge LCLMs' capacity for complex reasoning, we repurpose semantic parsing datasets. We convert databases to CSV, input with a natural language query, then prompt LCLMs to reason purely in natural language. We find a lot of headroom in complex compositional reasoning. [5/10]
Tweet media one
1
0
1
@_anthonychen
Anthony Chen 🤖
1 year
On multi-modal retrieval & RAG, LCLMs do shockingly well, sometimes outperforming specialized models. This is amazing given these LCLMs were not tuned for these tasks & the implications are huge: prompting a LCLM is way easier than building a custom RAG pipeline. [4/10]
Tweet media one
1
0
1