Shilong Liu @atasteoff X Profile

Shilong Liu

@atasteoff

Followers

659

Following

176

Media

11

Statuses

92

Postdoc Fellow @Princeton. PhD @Tsinghua_Uni. Prev @BytedanceTalk @ IDEA-Research @NVIDIA @Microsoft @Shengshu_ai

https://t.co/qkfLEC0LPS

Beijing

Joined October 2017

Don't wanna be here? Send us removal request.

Victoria Slocum

@victorialslocum

5 days

Why do RAG systems feel like they hit a ceiling? I've been diving into @helloiamleonie's latest article on agent memory, and it provided so much clarity into the current evolution of RAG systems. The progression from RAG → Agentic RAG → Agent Memory isn't about adding

26

196

1K

Shilong Liu

@atasteoff

4 days

Happy to be liked by Prof @drfeifei !

0

11

Shilong Liu

@atasteoff

4 days

seems like the wave-particle dualism in generatives models.

Rosinality

@rosinality

4 days

An architecture for self speculative decoding by supporting block diffusion and AR in the same model. I think this kind of approach is quite promising. Anyway, there are inherently sequential problems in generation (especially for agentic trajectories) and parallelizable ones at

0

2

Shilong Liu

@atasteoff

4 days

“Problems with common-sense reasoning were the main thing separating GPT-5’s performance from human level.”

Surge AI

@HelloSurgeAI

5 days

Everyone's acting like models are ready to replace humans in work settings. We put that to the test by creating an entire company and having 9 models act as a customer service agent handling 150 tickets and requests of increasing complexity. Verdict: without common sense,

0

2

Shilong Liu

@atasteoff

4 days

May I build a world in the world built by World Labs?

World Labs

@theworldlabs

5 days

Introducing Marble by World Labs: a foundation for a spatially intelligent future. Create your world at https://t.co/V267VJu1H9

1

6

Shilong Liu

@atasteoff

4 days

Cool work!

Kevin Lin

@KevinQHLin

5 days

🤗Excited to open-source GroundCUA! 🚀A large-scale, human-annotated dataset for precise UI grounding to advance Computer-Use Agents. - 3.56M+ high-quality human annotations - 56K screenshots - 87 desktop applications - all datasets and models are available project page:

1

0

2

Shilong Liu

@atasteoff

5 days

Turns out being a genius doesn’t help you make money in the market. A clear negative correlation between general intelligence (LMArena score) and Sharpe ratio across markets.

Jiaxuan You

@youjiaxuan

5 days

We believe future forecasting is the ultimate challenge for agentic LLMs. 🚀 Live Trade Bench is now fully open-sourced! It’s the first live, real-world benchmark testing 20+ LLMs on financial forecasting. 📄 Read our 37-page paper detailing insights from a 2-month live trading

0

1

Shilong Liu

@atasteoff

5 days

do we need 3d specific models in the future?

SkalskiP

@skalskip92

6 days

qwen image edit for camera control is wild you can zoom, pan, and rotate the camera

0

3

Shilong Liu

@atasteoff

5 days

Agent & World co-evolving 🧐

Yixin Ye

@BLeavesYe

6 days

Your code changes while the agent plans. Users message while it thinks. Current AI agents freeze🧊 the world to reason about it. What if AI agents could think deeply without missing what's happening around them🔥? We propose a new agent paradigm: real-time reasoning. 🔗in🧵

0

Shilong Liu

@atasteoff

5 days

Mark a paper mentioned in the cookbook:

Shikhar Kwatra

@shikharkwatra

7 days

👋 Super Excited to share our new @OpenAI cookbook is now LIVE: 𝗦𝗲𝗹𝗳-𝗘𝘃𝗼𝗹𝘃𝗶𝗻𝗴 𝗔𝗴𝗲𝗻𝘁𝘀 - 𝗔 𝗖𝗼𝗼𝗸𝗯𝗼𝗼𝗸 𝗳𝗼𝗿 𝗔𝘂𝘁𝗼𝗻𝗼𝗺𝗼𝘂𝘀 𝗔𝗴𝗲𝗻𝘁 𝗥𝗲𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 🔗 https://t.co/QLeBaO1R1c cc @OpenAIDevs

0

1

2

Shilong Liu

@atasteoff

5 days

Interesting take aways: 1. Gemini is the best over the three models. (Fig 1,2) 2. More tokens, better performance. (Fig 3)

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

5 days

ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents "We introduce ResearchRubrics, a standardized benchmark for DR built with over 2,800+ hours of human labor that pairs realistic, domain-diverse prompts with 2,500+ expert-written,

1

2

Shilong Liu

@atasteoff

5 days

competition btw carbon and silicon intelligence

机器之心 JIQIZHIXIN

@jiqizhixin

5 days

Nature just dropped an article that might stir mixed emotions. In a few labs and startups, researchers are growing human neurons and turning them into biological transistors. They believe these neuron networks could one day rival supercomputers, without the massive energy cost.

0

2

Shilong Liu

@atasteoff

5 days

So do we need an evil LLM to better help us?

Rohan Paul

@rohanpaul_ai

6 days

New Tencent paper shows safety aligned LLMs fail to convincingly role play villains and self serving characters. Safety training teaches models to be helpful and honest, which blocks traits like lying or manipulation. It proves a real tension between alignment and faithful

0

2

Shilong Liu

@atasteoff

5 days

Update models per 2 hours? Semi-evolving models are coming.

Stephanie Chan

@scychan_brains

5 days

Just found out that @cursor_ai updates their tab-completion model every 2 hours 😯 The age of continual learning is nigh https://t.co/M8mi2XbxKg (h/t @sashankpisupati)

0

Shilong Liu

@atasteoff

5 days

How to give an AI startup a cool name?

0

3

Shilong Liu

@atasteoff

5 days

relearn the pattern!

Xueyan Zou

@xyz2maureen

5 days

@atasteoff Hahah, find this pattern 2 years before as well.

0

3

Shilong Liu

@atasteoff

5 days

Different languages help each other

DailyPapers

@HuggingPapers

5 days

Facebook just released Meta CLIP 2 on Hugging Face. This is the first recipe to train CLIP from scratch on worldwide web-scale image-text pairs. It achieves state-of-the-art multilingual performance in vision-language tasks.

0

Bingxiang He

@HBX_hbx

5 days

✨What if the simplest RL recipe is all you need? Introducing JustRL: new SOTA among 1.5B reasoning models with 2× less compute. Stable improvement over 4,000+ steps. No multi-stage pipelines. No dynamic schedules. Just simple RL at scale. 📄 Blog: https://t.co/RofiFx9bl8

6

50

310

Shilong Liu

@atasteoff

5 days

Seems like we dont need large models for code and math.

WeiboLLM

@WeiboLLM

6 days

⭐ VibeThinker-1.5B — SOTA reasoning in a tiny model. 🚀 Performance: Highly competitive on AIME24/25 & HMMT25 — surpasses DeepSeek R1-0120 on math, and outperforms same-size models in competitive coding. ⚡ Efficiency: Only 1.5B params — 100-600× smaller than giants like Kimi K2

0

3

Shilong Liu

@atasteoff

5 days

ICLR scores are rolling in. A noticeable pattern: The higher the submission ID, the lower the scores appear to be. 😬 #ICLR2026

10

4

169