Yun Qu @quyun52425662 X Profile

Yun Qu

@quyun52425662

Followers

15

Following

41

Media

7

Statuses

16

Ph.D. student at Tsinghua University

Joined March 2022

Don't wanna be here? Send us removal request.

Yun Qu

@quyun52425662

29 days

RT @AlbertW24045555:

0

1

0

Yun Qu

@quyun52425662

1 month

RT @dongxi_nlp: 「 Prompt Difficulty Estimation，Tsinghua 」. Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reas….

0

9

0

Yun Qu

@quyun52425662

1 month

RT @AlbertW24045555: #LargeReasoningModel Can prompt difficulty be predicted online to accelerate RL finetuning of Reasoning Models? YES!….

0

2

0

Yun Qu

@quyun52425662

3 months

RT @AlbertW24045555: @dongxi_nlp 马哥，我们清华课题组的早期工作model predictive task sampling已经提出了这个思路，欢迎关注还把这个思路用在了强化学习中，见ICML202….

arxiv.org

Task robust adaptation is a long-standing pursuit in sequential decision-making. Some risk-averse strategies, e.g., the conditional value-at-risk principle, are incorporated in domain...

0

3

0

Yun Qu

@quyun52425662

3 months

RT @dongxi_nlp: 清华智能决策课题组工作：. Model Predictive Task Sampling for Efficient and Robust Adaptation. 把 “风险建模 + 主动采样” 用于跨任务自适应的轻量级框架. 现实系统（机器人、….

0

4

0

Yun Qu

@quyun52425662

3 months

Fast and Robust: Task Sampling with Posterior and Diversity Synergies for Adaptive Decision-Makers in Randomized Environments.arxiv:

arxiv.org

Task robust adaptation is a long-standing pursuit in sequential decision-making. Some risk-averse strategies, e.g., the conditional value-at-risk principle, are incorporated in domain...

Yun Qu

@quyun52425662

3 months

🥳Just accepted at #ICML2025: PDTS for fast and robust decision-making!💪.🔓Unlocks the potential of robust active task sampling .🎯Boosts zero-shot & few-shot adaptation robustness.⚡️Plug-and-Play, low-cost.🚀Accelerates learning. Project website👉

0

2

Yun Qu

@quyun52425662

3 months

3⃣ PDTS shows superiority in zero-shot (Physical and Visual DR) and few-shot (Meta-RL) adaptive decision-making. Its advantages include (i) robust adaptation, (ii) acceleration, (iii) improved OOD performance, (iv) better risk discrimination, and (v) minimal additional cost.

0

2

Yun Qu

@quyun52425662

3 months

2⃣For achieving worst-case optimization, we make a diagnosis of the concentration issue in MPTS and enhance the acquisition function with the diversity regularization. We further adopt the posterior sampling strategy to simplify implementation and exploit the stochastic optimism.

1

0

2

Yun Qu

@quyun52425662

3 months

1⃣ Building on MPTS, we present robust active task sampling (RATS) paradigm to surrogate cost evaluations via active inference. We abstract RATS as a task-selection MDP, construct an infinitely many-armed bandit (i-MAB) for task selection and analyze MPTS as a special solution.

1

0

2

Yun Qu

@quyun52425662

3 months

🥳Just accepted at #ICML2025: PDTS for fast and robust decision-making!💪.🔓Unlocks the potential of robust active task sampling .🎯Boosts zero-shot & few-shot adaptation robustness.⚡️Plug-and-Play, low-cost.🚀Accelerates learning. Project website👉

1

0

5

Yun Qu

@quyun52425662

6 months

RT @AlbertW24045555: Not Limited to #DeepSeek: Two years after finishing my PhD at VAE's birthplace, AMLab, I'm thrilled to share a VAE-ins….

0

5

0

Yun Qu

@quyun52425662

8 months

RT @AlbertW24045555: #AAAI2025 #LLM4RL Sparse feedback and reward design are lasting challenges in the RL field. Will large models help add….

0

1

0

Yun Qu

@quyun52425662

10 months

3/ 🧐LEMAE demonstrates potential for generalization to brand-new, non-symbolic tasks.

0

2

Yun Qu

@quyun52425662

10 months

2/ 💪By guiding and organizing exploration through our designs, LEMAE exhibits a significant reduction in redundant exploration and achieves a 10x speedup on challenging exploration benchmarks while eliminating the human workload of dense reward design.

1

0

2

Yun Qu

@quyun52425662

10 months

Leveraging LLM to revolutionize exploration in RL!🥳 .1/ LEMAE: a systematic approach channeling task-specific information from LLM to distinguish key states as subgoals for targeted RL exploration. Project: Paper:

1

0

4

Yun Qu

@quyun52425662

1 year

RT @AlbertW24045555: #MultitaskLearning.Feel free to access the latest SOTA method in Multi-task Optimization. In this work, "GO4Align: gro….

0

1

0