Xinran Zhao Profile
Xinran Zhao

@xinranz3

Followers
269
Following
502
Media
6
Statuses
43

Current Ph.D. student @LTIatCMU; Interning at @ai2_s2research Ex: @stanfordnlp,@hkustknowcomp,@TencentGlobal AI Lab at Bellevue, @GoogleDeepMind

Joined September 2023
Don't wanna be here? Send us removal request.
@allen_ai
Ai2
25 days
Today we’re releasing Deep Research Tulu (DR Tulu)—the first fully open, end-to-end recipe for long-form deep research, plus an 8B agent you can use right away. Train agents that plan, search, synthesize, & cite across sources, making expert research more accessible. 🧭📚
13
123
667
@RulinShao
Rulin Shao
25 days
🔥Thrilled to introduce DR Tulu-8B, an open long-form Deep Research model that matches OpenAI DR 💪Yes, just 8B! 🚀 The secret? We present Reinforcement Learning with Evolving Rubrics (RLER) for long-form non-verifiable DR tasks! Our rubrics: - co-evolve with the policy model -
8
116
542
@RulinShao
Rulin Shao
25 days
🗣️ DR Tulu is a teamwork! It was such an exciting journey co-leading the project with @AkariAsai @shannonzshen @hamishivi ! Kudos to all contributors @varsha_kishore_ @JingmingZhuo @xinranz3 Molly Park @IAmSamFin @david_sontag @CoachMurray47 @sewon__min @pdasigi @soldni
0
3
24
@tomchen0
Tong Chen @ NeurIPS
30 days
OpenAI's blog ( https://t.co/Mu05PFfPXg) points out that today’s language models hallucinate because training and evaluation reward guessing instead of admitting uncertainty. This raises a natural question: can we reduce hallucination without hurting utility?🤔 On-policy RL with
26
124
674
@yilinjz
Yilin Zhang
1 month
Is your code retriever silently breaking functions with line-based chunking? We developed cAST, our new #EMNLP2025 work, for AST-based code chunking that preserves syntactic structure and semantic boundaries. Paper: https://t.co/lZLTiLBXfc Code: https://t.co/mOC9P5miq5
1
5
12
@xinranz3
Xinran Zhao
1 month
(7/7) When to use MoR? ➡️When you have diverse questions and there is no single optimal retriever ➡️When you need efficiency while maintaining performance ➡️When your human experts can help 💗This project is co-led by @JushaanSingh, an awesome master student who worked with
0
2
3
@xinranz3
Xinran Zhao
1 month
(6/7) MoR can be made even more efficient through pre-rejecting retrievers before calculating the query-document relevance scores! Through thresholding, MoR can maintain strong performance with only 20% retriever usage 🥳per query.
1
2
3
@xinranz3
Xinran Zhao
1 month
(5/7) 😊One more thing -- Models ask humans for clarification questions and also treat humans as retrievers. Can we also mix humans in & decide when to query them? Possibly! In a simulation experiment, we show a 58.9% gain for MoR+Humans compared to simply aggregating human
1
0
2
@xinranz3
Xinran Zhao
1 month
(4/7) 📊We consider 8 retrievers of different types to construct the mixture (BM25, SimCSE, Contriever, DPR, TAS-B, etc), with 0.8 B parameters in total. We compare with 7 B retrievers (RepLLaMA and GritLM) on 4 challenging tasks (NFCorpus, SciDocs, SciFact, and SciQ) 💫MoR is
1
0
2
@xinranz3
Xinran Zhao
1 month
(3/7) We built the ✨Mixture of Retrievers (MoR)✨ framework to assign weights to each query-document pair for each retriever, supporting plug-and-play integration. To estimate how each retriever’s result should contribute, we gather weighting signals from: ➡️Before conducting
1
0
2
@xinranz3
Xinran Zhao
1 month
(2/7) The gap: there is no one-size-fits-all for retriever selection for complex queries: ➡️Worse performance retrievers can win best retrievers at certain queries. ➡️Retrievers with similar performance are good at different types of queries. Instead of just choosing the best
1
0
3
@xinranz3
Xinran Zhao
1 month
(1/7) 🧐Can we dynamically select and integrate the best retrievers for each query? We introduce ✨MoR✨: a zero-shot way to handle diverse queries with a weighted combination of heterogeneous retrievers – even including human information sources! We will present this paper at
1
9
33
@youjiaxuan
Jiaxuan You@NeurIPS
3 months
Seeking alpha with LLMs? 📈🤖 Try TradeBench 👉 https://t.co/gRxOGiQW1q We challenge top LLMs to manage portfolios across stocks + Polymarket, powered by live news & social signals. Performance updates every minute. We believe forecasting the future is the ultimate benchmark
0
7
15
@boyuan__zheng
Boyuan Zheng
5 months
Remember “Son of Anton” from the Silicon Valley show(@SiliconHBO)? The experimental AI that “efficiently” orders 4,000 lbs of meat while looking for a cheap burger and “fixes” a bug by deleting all the code? It’s starting to look a lot like reality. Even 18 months ago, my own
@scale_AI
Scale AI
5 months
As AI agents start taking real actions online, how do we prevent unintended harm? We teamed up with @OhioState and @UCBerkeley to create WebGuard: the first dataset for evaluating web agent risks and building real-world safety guardrails for online environments. 🧵
0
30
68
@dorazhao9
Dora Zhao
5 months
While we’re building amazing new human-AI systems, how do we actually know if they work well for people? In our #ACL2025 Findings Paper, we introduce SPHERE, a framework for making evaluations of human-AI systems more transparent and replicable. ✨ https://t.co/afVCweAqZU
1
28
91
@1000seagull
Christina Ma
5 months
🤖 Evaluating Human-AI Systems? Time to raise the bar. Check out SPHERE: An Evaluation Card for Human-AI Systems at ACL 2025 poster! 🗓️ July 28 18:00 📍 Hall X4/X5 🔗 https://t.co/yErqTbzaia Let’s talk transparent, rigorous, and human-centric evaluation! #ACL2025NLP #humanai
1
9
32
@kenziyuliu
Ken Liu
5 months
heading to @icmlconf #ICML2025 next week! come say hi & i'd love to learn about your work :) i'll present this paper ( https://t.co/4rFtApYs2Q) on the pitfalls of training set inclusion in LLMs, Thursday 11am here are my talk slides to flip through: https://t.co/kdM992vkTv
@kenziyuliu
Ken Liu
8 months
An LLM generates an article verbatim—did it “train on” the article? It’s complicated: under n-gram definitions of train-set inclusion, LLMs can complete “unseen” texts—both after data deletion and adding “gibberish” data. Our results impact unlearning, MIAs & data transparency🧵
6
48
309
@ChengleiSi
CLS
6 months
Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.
12
196
635
@_reachsumit
Sumit
6 months
Revela: Dense Retriever Learning via Language Modeling Introduces a unified framework for self-supervised retriever learning via language modeling with in-batch attention mechanism. 📝 https://t.co/ycSRFZfYHZ 👨🏽‍💻 https://t.co/tPKCeLoB4X
Tweet card summary image
github.com
Contribute to TRUMANCFY/Revela development by creating an account on GitHub.
1
3
7
@_reachsumit
Sumit
6 months
MoR: Better Handling Diverse Queries with a Mixture of Sparse, Dense, and Human Retrievers @JushaanSingh et al. present a zero-shot framework that dynamically combines heterogeneous retrievers for each query. 📝 https://t.co/ni7UnGHoqs 👨🏽‍💻 https://t.co/rjVQoSvBTj
Tweet card summary image
github.com
Contribute to Josh1108/MixtureRetrievers development by creating an account on GitHub.
0
4
9