Xinxi Lyu Profile
Xinxi Lyu

@XinxiLyu

Followers
77
Following
4
Media
2
Statuses
7

PYI @allen_ai | Incoming PhD student @UofIllinois | BS/MS from @uwcse

Seattle, UW
Joined March 2022
Don't wanna be here? Send us removal request.
@XinxiLyu
Xinxi Lyu
29 days
With amazing collaborators: @micdun8, @RulinShao, @PangWeiKoh, @sewon__min. [6/N].
1
0
3
@XinxiLyu
Xinxi Lyu
29 days
Check out the paper for.- Results with larger models (up to 70B) and different model families (LLaMA, Mistral, Qwen3, QwQ).- Comparison to a search engine.- Importance of the choice of data sources.- Trade-offs between index size and performance.[5/N].
1
0
3
@XinxiLyu
Xinxi Lyu
29 days
Interestingly, we found CompactDS + naive RAG allows QwQ to outperform or match agentic pipelines with a commercial search engine🤖(e.g., search-o1) on GPQA Diamond and MATH-500. [4/N]
Tweet media one
1
0
3
@XinxiLyu
Xinxi Lyu
29 days
Key insights:.1. Most web content can be filtered out without sacrificing coverage, and a compact, high-quality subset is sufficient.2. Combining in-memory approximate nearest neighbor (ANN) retrieval and on-disk exact search balances speed and recall. [3/N].
1
0
3
@XinxiLyu
Xinxi Lyu
29 days
There exists prior work using web-scale, web-coverage datastores, but such datastores are typically inaccessible, e.g., requiring 10+ TB RAM without multi-node serving. CompactDS achieves high accuracy🎯+ subsecond latency⚡️on a single machine with just 100GB RAM💾. [2/N].
1
0
3
@XinxiLyu
Xinxi Lyu
29 days
Reasoning benchmarks (e.g., MMLU Pro and GPQA) have seen little benefit from naive RAG. But can we flip this?.🔥Introducing CompactDS:. ✅Web-scale coverage. ✅Runs with just 100GB RAM. ✅Matches search engines.The simplest RAG pipeline can even compete with agentic
1
17
53