Shilong Liu
@atasteoff
Followers
659
Following
176
Media
11
Statuses
92
Postdoc Fellow @Princeton. PhD @Tsinghua_Uni. Prev @BytedanceTalk @ IDEA-Research @NVIDIA @Microsoft @Shengshu_ai
Beijing
Joined October 2017
Why do RAG systems feel like they hit a ceiling? I've been diving into @helloiamleonie's latest article on agent memory, and it provided so much clarity into the current evolution of RAG systems. The progression from RAG → Agentic RAG → Agent Memory isn't about adding
26
196
1K
seems like the wave-particle dualism in generatives models.
An architecture for self speculative decoding by supporting block diffusion and AR in the same model. I think this kind of approach is quite promising. Anyway, there are inherently sequential problems in generation (especially for agentic trajectories) and parallelizable ones at
0
0
2
“Problems with common-sense reasoning were the main thing separating GPT-5’s performance from human level.”
Everyone's acting like models are ready to replace humans in work settings. We put that to the test by creating an entire company and having 9 models act as a customer service agent handling 150 tickets and requests of increasing complexity. Verdict: without common sense,
0
0
2
May I build a world in the world built by World Labs?
Introducing Marble by World Labs: a foundation for a spatially intelligent future. Create your world at https://t.co/V267VJu1H9
1
1
6
Turns out being a genius doesn’t help you make money in the market. A clear negative correlation between general intelligence (LMArena score) and Sharpe ratio across markets.
We believe future forecasting is the ultimate challenge for agentic LLMs. 🚀 Live Trade Bench is now fully open-sourced! It’s the first live, real-world benchmark testing 20+ LLMs on financial forecasting. 📄 Read our 37-page paper detailing insights from a 2-month live trading
0
0
1
Agent & World co-evolving 🧐
Your code changes while the agent plans. Users message while it thinks. Current AI agents freeze🧊 the world to reason about it. What if AI agents could think deeply without missing what's happening around them🔥? We propose a new agent paradigm: real-time reasoning. 🔗in🧵
0
0
0
Mark a paper mentioned in the cookbook:
👋 Super Excited to share our new @OpenAI cookbook is now LIVE: 𝗦𝗲𝗹𝗳-𝗘𝘃𝗼𝗹𝘃𝗶𝗻𝗴 𝗔𝗴𝗲𝗻𝘁𝘀 - 𝗔 𝗖𝗼𝗼𝗸𝗯𝗼𝗼𝗸 𝗳𝗼𝗿 𝗔𝘂𝘁𝗼𝗻𝗼𝗺𝗼𝘂𝘀 𝗔𝗴𝗲𝗻𝘁 𝗥𝗲𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 🔗 https://t.co/QLeBaO1R1c cc @OpenAIDevs
0
1
2
Interesting take aways: 1. Gemini is the best over the three models. (Fig 1,2) 2. More tokens, better performance. (Fig 3)
ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents "We introduce ResearchRubrics, a standardized benchmark for DR built with over 2,800+ hours of human labor that pairs realistic, domain-diverse prompts with 2,500+ expert-written,
1
1
2
competition btw carbon and silicon intelligence
Nature just dropped an article that might stir mixed emotions. In a few labs and startups, researchers are growing human neurons and turning them into biological transistors. They believe these neuron networks could one day rival supercomputers, without the massive energy cost.
0
0
2
So do we need an evil LLM to better help us?
New Tencent paper shows safety aligned LLMs fail to convincingly role play villains and self serving characters. Safety training teaches models to be helpful and honest, which blocks traits like lying or manipulation. It proves a real tension between alignment and faithful
0
0
2
Update models per 2 hours? Semi-evolving models are coming.
Just found out that @cursor_ai updates their tab-completion model every 2 hours 😯 The age of continual learning is nigh https://t.co/M8mi2XbxKg (h/t @sashankpisupati)
0
0
0
relearn the pattern!
0
0
3
✨What if the simplest RL recipe is all you need? Introducing JustRL: new SOTA among 1.5B reasoning models with 2× less compute. Stable improvement over 4,000+ steps. No multi-stage pipelines. No dynamic schedules. Just simple RL at scale. 📄 Blog: https://t.co/RofiFx9bl8
6
50
310
Seems like we dont need large models for code and math.
⭐ VibeThinker-1.5B — SOTA reasoning in a tiny model. 🚀 Performance: Highly competitive on AIME24/25 & HMMT25 — surpasses DeepSeek R1-0120 on math, and outperforms same-size models in competitive coding. ⚡ Efficiency: Only 1.5B params — 100-600× smaller than giants like Kimi K2
0
0
3
ICLR scores are rolling in. A noticeable pattern: The higher the submission ID, the lower the scores appear to be. 😬 #ICLR2026
10
4
169