TengX6 Profile Banner
Teng Xiao Profile
Teng Xiao

@TengX6

Followers
119
Following
679
Media
7
Statuses
43

Postdoc at the Allen Institute for AI @allen_ai and @uwnlp. Machine Learning and Reinforcement Learning

Seattle, USA
Joined September 2019
Don't wanna be here? Send us removal request.
@huaiszhu
Huaisheng Zhu
1 month
🚀 New Paper Alert! 🧠 Simple Denoising Diffusion Language Models (SDDLMs) We simplify the complex ELBO objectives in Uniform-State Diffusion Models with a simple denoising loss, making training more scalable — while matching or surpassing baseline generation quality. (1/2)
1
3
1
@shiqi_chen17
Shiqi Chen
1 month
Want to get an LLM agent to succeed in an OOD environment? We tackle the hardest case with SPA (Self-Play Agent). No extra data, tools, or stronger models. Pure self-play. We first internalize a world model via Self-Play, then we learn how to win by RL. Like a child playing
4
27
225
@TengX6
Teng Xiao
8 months
I will attend #ICLR2025 to present our papers on LLM alignment, RL, and RLHF. Looking forward to meet you at our posters! On a Connection Between Imitation Learning and RLHF 📍 Hall 3 + Hall 2B #223 🕙 Saturday, Apr 26 at 10:00 (+08) SimPER: A Minimalist Approach to Preference
0
5
20
@TengX6
Teng Xiao
9 months
Huge thanks to my incredible collaborators: @1t4chiii @learn2dropout, Mingxiao Li, Shangsong Liang, @zhaochun_ren, and @vhonavar! This is a huge milestone, and we’re thrilled to see SimPER ( https://t.co/ClL23HoefG) contributing to cutting-edge advancements in LLM reasoning.
openreview.net
Existing preference optimization objectives for language model alignment require additional hyperparameters that must be extensively tuned to achieve optimal performance, increasing both the...
0
0
1
@TengX6
Teng Xiao
9 months
🚀 Incredible news! Our SimPER accepted by ICLR 2025 has been used as the training algorithm in a recent paper EXAONE Deep: Reasoning Enhanced Language Models ( https://t.co/mTst9UN4h5) from LG AI—and their open model achieved better and competitive performance compared to QwQ
@LG_AI_Research
LG AI Research
9 months
🚀 Breaking News! We’re thrilled to introduce #EXAONEDeep, a next-generation AI model designed to enhance reasoning capabilities—Evolving into ‘Agentic AI‘ for real-world industry solutions! 🧠 Specialized in math, science, and coding tasks, EXAONE Deep pushes the boundaries of
1
1
3
@LG_AI_Research
LG AI Research
9 months
🚀 Breaking News! We’re thrilled to introduce #EXAONEDeep, a next-generation AI model designed to enhance reasoning capabilities—Evolving into ‘Agentic AI‘ for real-world industry solutions! 🧠 Specialized in math, science, and coding tasks, EXAONE Deep pushes the boundaries of
Tweet card summary image
huggingface.co
149
651
3K
@TengX6
Teng Xiao
9 months
🚀 Is RLHF really RL… or barely RL? 🤔 Excited to share our #ICLR2025 paper: "On a Connection Between Imitation Learning and RLHF" 🎯 🔍Our work suggests that RLHF is more closely related to imitation learning than RL, offering new insights into the alignment process. In this
Tweet card summary image
arxiv.org
This work studies the alignment of large language models with preference data from an imitation learning perspective. We establish a close theoretical connection between reinforcement learning...
1
1
6
@TengX6
Teng Xiao
10 months
❓ Can we align large language models without hyperparameters? The answer is SimPER! 🚀 Excited to share our ICLR paper: SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters! 🎯 SimPER eliminates the need for complex tuning and reference models,
1
4
8
@TengX6
Teng Xiao
1 year
In our EMNLP2024 paper, "How to Leverage Demonstration Data in Alignment for Large Language Models? A Self-Imitation Learning Perspective" ( https://t.co/quniEQRdiN), we also propose GSIL, a imitation learning (IL) approach that eliminates the need for complex adversarial training
Tweet card summary image
arxiv.org
This paper introduces a novel generalized self-imitation learning ($\textbf{GSIL}$) framework, which effectively and efficiently aligns large language models with offline demonstration data. We...
@m_wulfmeier
Markus Wulfmeier @ NeurIPS 2025
1 year
Imitation via Reinforcement Learning (IvRL, IRL, RFT, ...) is not just eating the whole cake but baking a massive, new cake! 🎂 🍒 https://t.co/FZnRgOAbIy Late to the party on o1, RFT, etc, but here are some thoughts: - kudos to the team at OAI on further RL-based products and
0
0
1
@TengX6
Teng Xiao
1 year
🚀 Excited to share that our new paper, Cal-DPO, focused on LLM alignment, has been accepted to #NeurIPS2024! 🎉 We empirically and theoretically demonstrate that substantial improvements over DPO can be achieved by calibrating implicit rewards to align with absolute reward
5
1
7
@TengX6
Teng Xiao
1 year
Extensive experiments show that GSIL consistently and significantly outperforms baselines in many challenging benchmarks, such as coding, mathematical reasoning and instruction-following benchmark. Code will be public available at
Tweet card summary image
github.com
How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective (EMNLP 2024) - tengxiao1/GSIL
0
0
1