Yanming Wan Profile
Yanming Wan

@yanming_wan

Followers
141
Following
10
Media
17
Statuses
26

PhD student at @uwcse.

Seattle, WA
Joined August 2024
Don't wanna be here? Send us removal request.
@yanming_wan
Yanming Wan
5 months
Personalization methods for LLMs often rely on extensive user history. We introduce Curiosity-driven User-modeling Reward as Intrinsic Objective (CURIO) to encourage actively learning about the user within multi-turn dialogs. 📜 https://t.co/QsDU5QcuSZ 🌎 https://t.co/zuOCwPrtrw
4
34
153
@yanming_wan
Yanming Wan
27 days
It’s a great honor to receive this award! Our related paper "Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward" has been accepted to #NeurIPS2025. Looking forward to sharing and discussing with everyone! 📜 https://t.co/ELtEt7AyiP 🌏
sites.google.com
Our work focuses on training personalized LLMs in multi-turn conversations. Standard LLM training methods treat all users as a homogeneous group, leading to suboptimal performance for different...
@uwcse
Allen School
27 days
Congratulations to @UW #UWAllen's @yanming_wan, professor @natashajaques and collaborators on winning this year's Madrona Prize at our recent Research Showcase & Open House—and huge thanks to @MadronaVentures for supporting and encouraging our student researchers! #UWinnovates
0
2
16
@kjha02
Kunal Jha
2 months
Forget modeling every belief and goal! What if we represented people as following simple scripts instead (i.e "cross the crosswalk")? Our new paper shows AI which models others’ minds as Python code 💻 can quickly and accurately predict human behavior! https://t.co/1t2fsW7jyL🧵
4
33
103
@yanming_wan
Yanming Wan
5 months
Overall, we propose CURIO for enhancing personalization in LLMs for multi-turn dialogs, which encourages LLMs to actively learn user traits and adapt its responses accordingly. This work was done with my awesome collaborators: @jiaxing_jxwu @marwaabdulhai @LiorShan @natashajaques
0
2
7
@yanming_wan
Yanming Wan
5 months
Baselines and entropy-based rewards lead to "controlling behavior", where the model gets high rewards by convincing the user to adopt a particular preference that is easier to cater to, rather than adhering to the ground-truth. "Grounded" rewards stop this reward hacking. (8/9)
1
0
4
@yanming_wan
Yanming Wan
5 months
With a proper reward choice, CURIO models achieve personalization without compromising coherence and overall quality. The baseline is trained to optimize conversation quality using exactly the same prompt for eval. DiffLogAcc outperforms the baseline and all other models. (7/9)
1
0
3
@yanming_wan
Yanming Wan
5 months
The second task has a more complicated reward, where personalization is relevant to the model performance in conducting a dialog but not the ultimate goal. CURIO models with accuracy-based intrinsic rewards remain effective and significantly improve personalization ability. (6/9)
1
0
3
@yanming_wan
Yanming Wan
5 months
On the first task, the agent needs to elicit user information in multi-turn dialogs before making a choice at the end. CURIO can effectively enhance personalization and reduce the generalization gap by "learning how to learn" rather than memorizing superficial user details. (5/9)
1
0
4
@yanming_wan
Yanming Wan
5 months
We discuss the following intrinsic rewards. Intuitively, we encourage gain in prediction accuracy or reduction in entropy. For those which are Potential-based Reward Shaping, the optimality is not affected, and we hypothesize that they potentially accelerate the training. (4/9)
1
0
3
@yanming_wan
Yanming Wan
5 months
Intrinsic Motivation is well-studied in RL but applying it to LLMs is non-trivial. The policy and environment models engage in a multi-turn dialog, and a reward model gives an extrinsic reward. On each turn, a user model predicts the belief and computes an intrinsic reward. (3/9)
1
1
8
@yanming_wan
Yanming Wan
5 months
We leverage a user model to incorporate a curiosity reward into standard multi-turn RLHF. Rather than training an LLM only with the end-of-conversation sparse reward, we add a turn-based reward that is given by its improvement in belief over the user type after each action. (2/9)
1
1
6
@jihan_yao
Jihan Yao
6 months
We introduce MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation ✅ Reliable: 94.3% agreement with human judgment ✅ Comprehensive: 4 modality combination × 49 tasks × 937 instructions 🔍Results and Takeaways: > GPT-Image-1 from @OpenAI
2
22
29
@yanming_wan
Yanming Wan
1 year
Overall, we present FISER for ambiguous instruction following by building a model that explicitly performs social reasoning to infer the human’s intentions from prior actions. This work was done with my awesome collaborators: @YueWu7677 @ypwang61 @maojiayuan @natashajaques (9/10)
1
0
2
@yanming_wan
Yanming Wan
1 year
We filter out a proportion of irrelevant objects to assess the impact of excessive item quantity on GPT-4. LLM relies on a very large proportion of objects being filtered out, showing that they can't effectively select relevant information and focus on relevant objects. (8/10)
1
0
3
@yanming_wan
Yanming Wan
1 year
We compare training models end-to-end with predicting goals as an auxiliary task, vs separating into two models that first predict goals, then the actions. Multi-staged is significantly better, implying that fully separating social from embodied reasoning performs better. (7/10)
1
0
1
@yanming_wan
Yanming Wan
1 year
We train FISER models from scratch using the following architecture. The first 2N layers form the social reasoning and the last N form the embodied reasoning. The embeddings at Layer 2N are used to recognize the robot’s task and the last layer is used to predict actions. (6/10)
1
0
0
@yanming_wan
Yanming Wan
1 year
In FISER, we explicitly model a human's intention by modeling the human’s overall plan as a set of predicates. We further assume that the human selects a subgoal that needs help, and specifies a robot’s task, which is the underlying intention of the instruction. (5/10)
1
0
1
@yanming_wan
Yanming Wan
1 year
Even with careful CoT prompts and few-shot examples, GPT-4 Turbo is far from small-scale models trained from scratch. We compare their failure modes and find that prompting methods alone cannot provide the model with the ability of social and embodied reasoning. (4/10)
1
0
1
@yanming_wan
Yanming Wan
1 year
We evaluate FISER over the challenging HandMeThat benchmark. We compare the trained models with competitive baselines, including the SOTA prior work on HMT, and CoT prompting on GPT-4 Turbo. Overall, our trained FISER models improve the performance across all levels. (3/10)
1
0
0