Teng Xiao
@TengX6
Followers
119
Following
679
Media
7
Statuses
43
Postdoc at the Allen Institute for AI @allen_ai and @uwnlp. Machine Learning and Reinforcement Learning
Seattle, USA
Joined September 2019
🚀 New Paper Alert! 🧠 Simple Denoising Diffusion Language Models (SDDLMs) We simplify the complex ELBO objectives in Uniform-State Diffusion Models with a simple denoising loss, making training more scalable — while matching or surpassing baseline generation quality. (1/2)
1
3
1
Want to get an LLM agent to succeed in an OOD environment? We tackle the hardest case with SPA (Self-Play Agent). No extra data, tools, or stronger models. Pure self-play. We first internalize a world model via Self-Play, then we learn how to win by RL. Like a child playing
4
27
225
Huge thanks to my incredible collaborators: @1t4chiii @learn2dropout, Mingxiao Li, Shangsong Liang, @zhaochun_ren, and @vhonavar! This is a huge milestone, and we’re thrilled to see SimPER ( https://t.co/ClL23HoefG) contributing to cutting-edge advancements in LLM reasoning.
openreview.net
Existing preference optimization objectives for language model alignment require additional hyperparameters that must be extensively tuned to achieve optimal performance, increasing both the...
0
0
1
🚀 Incredible news! Our SimPER accepted by ICLR 2025 has been used as the training algorithm in a recent paper EXAONE Deep: Reasoning Enhanced Language Models ( https://t.co/mTst9UN4h5) from LG AI—and their open model achieved better and competitive performance compared to QwQ
🚀 Breaking News! We’re thrilled to introduce #EXAONEDeep, a next-generation AI model designed to enhance reasoning capabilities—Evolving into ‘Agentic AI‘ for real-world industry solutions! 🧠 Specialized in math, science, and coding tasks, EXAONE Deep pushes the boundaries of
1
1
3
🚀 Breaking News! We’re thrilled to introduce #EXAONEDeep, a next-generation AI model designed to enhance reasoning capabilities—Evolving into ‘Agentic AI‘ for real-world industry solutions! 🧠 Specialized in math, science, and coding tasks, EXAONE Deep pushes the boundaries of
huggingface.co
149
651
3K
🚀 Is RLHF really RL… or barely RL? 🤔 Excited to share our #ICLR2025 paper: "On a Connection Between Imitation Learning and RLHF" 🎯 🔍Our work suggests that RLHF is more closely related to imitation learning than RL, offering new insights into the alignment process. In this
arxiv.org
This work studies the alignment of large language models with preference data from an imitation learning perspective. We establish a close theoretical connection between reinforcement learning...
1
1
6
❓ Can we align large language models without hyperparameters? The answer is SimPER! 🚀 Excited to share our ICLR paper: SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters! 🎯 SimPER eliminates the need for complex tuning and reference models,
1
4
8
In our EMNLP2024 paper, "How to Leverage Demonstration Data in Alignment for Large Language Models? A Self-Imitation Learning Perspective" ( https://t.co/quniEQRdiN), we also propose GSIL, a imitation learning (IL) approach that eliminates the need for complex adversarial training
arxiv.org
This paper introduces a novel generalized self-imitation learning ($\textbf{GSIL}$) framework, which effectively and efficiently aligns large language models with offline demonstration data. We...
Imitation via Reinforcement Learning (IvRL, IRL, RFT, ...) is not just eating the whole cake but baking a massive, new cake! 🎂 🍒 https://t.co/FZnRgOAbIy Late to the party on o1, RFT, etc, but here are some thoughts: - kudos to the team at OAI on further RL-based products and
0
0
1
🚀 Excited to share that our new paper, Cal-DPO, focused on LLM alignment, has been accepted to #NeurIPS2024! 🎉 We empirically and theoretically demonstrate that substantial improvements over DPO can be achieved by calibrating implicit rewards to align with absolute reward
5
1
7
Extensive experiments show that GSIL consistently and significantly outperforms baselines in many challenging benchmarks, such as coding, mathematical reasoning and instruction-following benchmark. Code will be public available at
github.com
How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective (EMNLP 2024) - tengxiao1/GSIL
0
0
1