atasteoff Profile Banner
Shilong Liu Profile
Shilong Liu

@atasteoff

Followers
659
Following
176
Media
11
Statuses
92

Postdoc Fellow @Princeton. PhD @Tsinghua_Uni. Prev @BytedanceTalk @ IDEA-Research @NVIDIA @Microsoft @Shengshu_ai

Beijing
Joined October 2017
Don't wanna be here? Send us removal request.
@victorialslocum
Victoria Slocum
5 days
Why do RAG systems feel like they hit a ceiling? I've been diving into @helloiamleonie's latest article on agent memory, and it provided so much clarity into the current evolution of RAG systems. The progression from RAG → Agentic RAG → Agent Memory isn't about adding
26
196
1K
@atasteoff
Shilong Liu
4 days
Happy to be liked by Prof @drfeifei !
0
0
11
@atasteoff
Shilong Liu
4 days
seems like the wave-particle dualism in generatives models.
@rosinality
Rosinality
4 days
An architecture for self speculative decoding by supporting block diffusion and AR in the same model. I think this kind of approach is quite promising. Anyway, there are inherently sequential problems in generation (especially for agentic trajectories) and parallelizable ones at
0
0
2
@atasteoff
Shilong Liu
4 days
“Problems with common-sense reasoning were the main thing separating GPT-5’s performance from human level.”
@HelloSurgeAI
Surge AI
5 days
Everyone's acting like models are ready to replace humans in work settings. We put that to the test by creating an entire company and having 9 models act as a customer service agent handling 150 tickets and requests of increasing complexity. Verdict: without common sense,
0
0
2
@atasteoff
Shilong Liu
4 days
May I build a world in the world built by World Labs?
@theworldlabs
World Labs
5 days
Introducing Marble by World Labs: a foundation for a spatially intelligent future. Create your world at https://t.co/V267VJu1H9
1
1
6
@atasteoff
Shilong Liu
4 days
Cool work!
@KevinQHLin
Kevin Lin
5 days
🤗Excited to open-source GroundCUA! 🚀A large-scale, human-annotated dataset for precise UI grounding to advance Computer-Use Agents. - 3.56M+ high-quality human annotations - 56K screenshots - 87 desktop applications - all datasets and models are available project page:
1
0
2
@atasteoff
Shilong Liu
5 days
Turns out being a genius doesn’t help you make money in the market. A clear negative correlation between general intelligence (LMArena score) and Sharpe ratio across markets.
@youjiaxuan
Jiaxuan You
5 days
We believe future forecasting is the ultimate challenge for agentic LLMs. 🚀 Live Trade Bench is now fully open-sourced! It’s the first live, real-world benchmark testing 20+ LLMs on financial forecasting. 📄 Read our 37-page paper detailing insights from a 2-month live trading
0
0
1
@atasteoff
Shilong Liu
5 days
do we need 3d specific models in the future?
@skalskip92
SkalskiP
6 days
qwen image edit for camera control is wild you can zoom, pan, and rotate the camera
0
0
3
@atasteoff
Shilong Liu
5 days
Agent & World co-evolving 🧐
@BLeavesYe
Yixin Ye
6 days
Your code changes while the agent plans. Users message while it thinks. Current AI agents freeze🧊 the world to reason about it. What if AI agents could think deeply without missing what's happening around them🔥? We propose a new agent paradigm: real-time reasoning. 🔗in🧵
0
0
0
@atasteoff
Shilong Liu
5 days
Mark a paper mentioned in the cookbook:
@shikharkwatra
Shikhar Kwatra
7 days
👋 Super Excited to share our new @OpenAI cookbook is now LIVE: 𝗦𝗲𝗹𝗳-𝗘𝘃𝗼𝗹𝘃𝗶𝗻𝗴 𝗔𝗴𝗲𝗻𝘁𝘀 - 𝗔 𝗖𝗼𝗼𝗸𝗯𝗼𝗼𝗸 𝗳𝗼𝗿 𝗔𝘂𝘁𝗼𝗻𝗼𝗺𝗼𝘂𝘀 𝗔𝗴𝗲𝗻𝘁 𝗥𝗲𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 🔗 https://t.co/QLeBaO1R1c cc @OpenAIDevs
0
1
2
@atasteoff
Shilong Liu
5 days
Interesting take aways: 1. Gemini is the best over the three models. (Fig 1,2) 2. More tokens, better performance. (Fig 3)
@iScienceLuvr
Tanishq Mathew Abraham, Ph.D.
5 days
ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents "We introduce ResearchRubrics, a standardized benchmark for DR built with over 2,800+ hours of human labor that pairs realistic, domain-diverse prompts with 2,500+ expert-written,
1
1
2
@atasteoff
Shilong Liu
5 days
competition btw carbon and silicon intelligence
@jiqizhixin
机器之心 JIQIZHIXIN
5 days
Nature just dropped an article that might stir mixed emotions. In a few labs and startups, researchers are growing human neurons and turning them into biological transistors. They believe these neuron networks could one day rival supercomputers, without the massive energy cost.
0
0
2
@atasteoff
Shilong Liu
5 days
So do we need an evil LLM to better help us?
@rohanpaul_ai
Rohan Paul
6 days
New Tencent paper shows safety aligned LLMs fail to convincingly role play villains and self serving characters. Safety training teaches models to be helpful and honest, which blocks traits like lying or manipulation. It proves a real tension between alignment and faithful
0
0
2
@atasteoff
Shilong Liu
5 days
Update models per 2 hours? Semi-evolving models are coming.
@scychan_brains
Stephanie Chan
5 days
Just found out that @cursor_ai updates their tab-completion model every 2 hours 😯 The age of continual learning is nigh https://t.co/M8mi2XbxKg (h/t @sashankpisupati)
0
0
0
@atasteoff
Shilong Liu
5 days
How to give an AI startup a cool name?
0
0
3
@atasteoff
Shilong Liu
5 days
relearn the pattern!
@xyz2maureen
Xueyan Zou
5 days
@atasteoff Hahah, find this pattern 2 years before as well.
0
0
3
@atasteoff
Shilong Liu
5 days
Different languages help each other
@HuggingPapers
DailyPapers
5 days
Facebook just released Meta CLIP 2 on Hugging Face. This is the first recipe to train CLIP from scratch on worldwide web-scale image-text pairs. It achieves state-of-the-art multilingual performance in vision-language tasks.
0
0
0
@HBX_hbx
Bingxiang He
5 days
✨What if the simplest RL recipe is all you need? Introducing JustRL: new SOTA among 1.5B reasoning models with 2× less compute. Stable improvement over 4,000+ steps. No multi-stage pipelines. No dynamic schedules. Just simple RL at scale. 📄 Blog: https://t.co/RofiFx9bl8
6
50
310
@atasteoff
Shilong Liu
5 days
Seems like we dont need large models for code and math.
@WeiboLLM
WeiboLLM
6 days
⭐ VibeThinker-1.5B — SOTA reasoning in a tiny model. 🚀 Performance: Highly competitive on AIME24/25 & HMMT25 — surpasses DeepSeek R1-0120 on math, and outperforms same-size models in competitive coding. ⚡ Efficiency: Only 1.5B params — 100-600× smaller than giants like Kimi K2
0
0
3
@atasteoff
Shilong Liu
5 days
ICLR scores are rolling in. A noticeable pattern: The higher the submission ID, the lower the scores appear to be. 😬 #ICLR2026
10
4
169