
Dongwei Jiang
@Dongwei__Jiang
Followers
510
Following
2K
Media
26
Statuses
142
Working on LLMs, focusing specifically on reasoning and self-improvement. Spent six years in my past life doing research in industry on speech processing
Baltimore, MD
Joined June 2022
🧵 Recent studies show LLMs can self-improve their responses when given external feedback. But how effectively can they incorporate it?. We tested this systematically—and found they can't fully integrate feedback, even when the feedback is high-quality and backed by ground-truth.
3
26
155
RT @leanprover: Incredibly grateful to @TheOfficialACM SIGPLAN for awarding #LeanLang the Programming Languages Software Award 2025 at #PLD….
0
38
0
RT @DanielKhashabi: 🚨🚨 New paper out with @Dongwei__Jiang and team:. Even with near-perfect, ground-truth feedback, LLMs often fail to full….
0
7
0
Huge thanks to my amazing collaborators: Alvin Zhang @ZAlvin39105, Andrew Wang @andrewwnlp, Nicholas Andrews, and Daniel Khashabi @DanielKhashabi at @jhuclsp!. Check out our:.Code: Data: Paper:
arxiv.org
Recent studies have shown LLMs possess some ability to improve their responses when given external feedback. However, it remains unclear how effectively and thoroughly these models can incorporate...
0
1
4
RT @linxins2: 🚨 We discovered a surprising side effect of Reinforcement Finetuning (RFT): it makes LLMs more confidently wrong on unanswera….
0
38
0
RT @Dongwei__Jiang: @_jasonwei We've been thinking about this gap too! Our paper ( found that when verifiable envir….
arxiv.org
Can LLMs consistently improve their previous outputs for better results? For this to be true, LLMs would need to be better at discriminating among previously-generated alternatives, than...
0
4
0
Now accepted by #ACL2025!. Thrilled to see our paper also referenced in @lilianweng's latest blog post on reasoning in LLMs! Check it out:
lilianweng.github.io
Special thanks to John Schulman for a lot of super valuable feedback and direct edits on this post. Test time compute (Graves et al. 2016, Ling, et al. 2017, Cobbe et al. 2021) and Chain-of-thought...
Process supervision for reasoning is 🔥! While previous approaches often relied on human annotation and struggled to generalize across different reasoning tasks, we're now asking: Can we improve this?. Introducing 𝐑𝐀𝐓𝐈𝐎𝐍𝐀𝐋𝐘𝐒𝐓: a new model pre-trained on implicit
0
11
58
RT @tli104: Excited to be presenting our paper on training language models under heavily imbalanced data tomorrow at #NAACL2025! If you wan….
arxiv.org
Data abundance across different domains exhibits a long-tailed distribution: few domains have abundant data, while most face data scarcity. Our work focuses on a multilingual setting, where...
0
7
0
RT @jackjingyuzhang: Current copyright mitigation methods for LLMs typically focus on average-case risks, but overlook worst-case scenarios….
0
9
0
RT @iScienceLuvr: Reasoning to Learn from Latent Thoughts. "Motivated by how humans apply deliberate thinking to learn from limited data, w….
0
114
0
RT @natolambert: Verification, The Key to AI .Read the archives of Rich Sutton, Turing Award winner :D, has all the major ideas https://t.c….
0
45
0
I'll be at #AAAI25 presenting my poster on Self-[In]Correct ( during Session 3 on March 1st at 12:30. Would love to connect if you're attending!.
arxiv.org
Can LLMs consistently improve their previous outputs for better results? For this to be true, LLMs would need to be better at discriminating among previously-generated alternatives, than...
0
2
25