Yun Qu Profile
Yun Qu

@quyun52425662

Followers
15
Following
41
Media
7
Statuses
16

Ph.D. student at Tsinghua University

Joined March 2022
Don't wanna be here? Send us removal request.
@quyun52425662
Yun Qu
29 days
0
1
0
@quyun52425662
Yun Qu
1 month
RT @dongxi_nlp: 「 Prompt Difficulty Estimation,Tsinghua 」. Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reas….
0
9
0
@quyun52425662
Yun Qu
1 month
RT @AlbertW24045555: #LargeReasoningModel Can prompt difficulty be predicted online to accelerate RL finetuning of Reasoning Models? YES!….
0
2
0
@quyun52425662
Yun Qu
3 months
RT @AlbertW24045555: @dongxi_nlp 马哥,我们清华课题组的早期工作model predictive task sampling已经提出了这个思路,欢迎关注还把这个思路用在了强化学习中,见ICML202….
Tweet card summary image
arxiv.org
Task robust adaptation is a long-standing pursuit in sequential decision-making. Some risk-averse strategies, e.g., the conditional value-at-risk principle, are incorporated in domain...
0
3
0
@quyun52425662
Yun Qu
3 months
RT @dongxi_nlp: 清华智能决策课题组工作:. Model Predictive Task Sampling for Efficient and Robust Adaptation. 把 “风险建模 + 主动采样” 用于跨任务自适应的轻量级框架. 现实系统(机器人、….
0
4
0
@quyun52425662
Yun Qu
3 months
Fast and Robust: Task Sampling with Posterior and Diversity Synergies for Adaptive Decision-Makers in Randomized Environments.arxiv:
Tweet card summary image
arxiv.org
Task robust adaptation is a long-standing pursuit in sequential decision-making. Some risk-averse strategies, e.g., the conditional value-at-risk principle, are incorporated in domain...
@quyun52425662
Yun Qu
3 months
🥳Just accepted at #ICML2025: PDTS for fast and robust decision-making!💪.🔓Unlocks the potential of robust active task sampling .🎯Boosts zero-shot & few-shot adaptation robustness.⚡️Plug-and-Play, low-cost.🚀Accelerates learning. Project website👉
Tweet media one
0
0
2
@quyun52425662
Yun Qu
3 months
3⃣ PDTS shows superiority in zero-shot (Physical and Visual DR) and few-shot (Meta-RL) adaptive decision-making. Its advantages include (i) robust adaptation, (ii) acceleration, (iii) improved OOD performance, (iv) better risk discrimination, and (v) minimal additional cost.
Tweet media one
0
0
2
@quyun52425662
Yun Qu
3 months
2⃣For achieving worst-case optimization, we make a diagnosis of the concentration issue in MPTS and enhance the acquisition function with the diversity regularization. We further adopt the posterior sampling strategy to simplify implementation and exploit the stochastic optimism.
Tweet media one
1
0
2
@quyun52425662
Yun Qu
3 months
1⃣ Building on MPTS, we present robust active task sampling (RATS) paradigm to surrogate cost evaluations via active inference. We abstract RATS as a task-selection MDP, construct an infinitely many-armed bandit (i-MAB) for task selection and analyze MPTS as a special solution.
Tweet media one
1
0
2
@quyun52425662
Yun Qu
3 months
🥳Just accepted at #ICML2025: PDTS for fast and robust decision-making!💪.🔓Unlocks the potential of robust active task sampling .🎯Boosts zero-shot & few-shot adaptation robustness.⚡️Plug-and-Play, low-cost.🚀Accelerates learning. Project website👉
Tweet media one
1
0
5
@quyun52425662
Yun Qu
6 months
RT @AlbertW24045555: Not Limited to #DeepSeek: Two years after finishing my PhD at VAE's birthplace, AMLab, I'm thrilled to share a VAE-ins….
0
5
0
@quyun52425662
Yun Qu
8 months
RT @AlbertW24045555: #AAAI2025 #LLM4RL Sparse feedback and reward design are lasting challenges in the RL field. Will large models help add….
0
1
0
@quyun52425662
Yun Qu
10 months
3/ 🧐LEMAE demonstrates potential for generalization to brand-new, non-symbolic tasks.
Tweet media one
0
0
2
@quyun52425662
Yun Qu
10 months
2/ 💪By guiding and organizing exploration through our designs, LEMAE exhibits a significant reduction in redundant exploration and achieves a 10x speedup on challenging exploration benchmarks while eliminating the human workload of dense reward design.
Tweet media one
1
0
2
@quyun52425662
Yun Qu
10 months
Leveraging LLM to revolutionize exploration in RL!🥳 .1/ LEMAE: a systematic approach channeling task-specific information from LLM to distinguish key states as subgoals for targeted RL exploration. Project: Paper:
Tweet media one
1
0
4
@quyun52425662
Yun Qu
1 year
RT @AlbertW24045555: #MultitaskLearning.Feel free to access the latest SOTA method in Multi-task Optimization. In this work, "GO4Align: gro….
0
1
0