 
            
              Sean O'Brien
            
            @seano_research
Followers
                106
              Following
                169
              Media
                5
              Statuses
                21
              UCSD PhD student studying LLMs Ex-Meta AI, Berkeley AI Research
              
              Joined August 2023
            
            
           Are AI models for music truly listening, or just good at guessing? This critical question is at the heart of the latest Best Paper Award winner at #ISMIR2025! Huge congratulations to Yongyi Zang, Sean O'brien, Taylor Berg Kirkpatrick, Julian McAuley, and Zachary Novack for their 
          
                
                0
              
              
                
                9
              
              
                
                22
              
             New paper showing that Contrastive Decoding (CD) works really well for reasoning tasks, e.g. +6 on GSM8K and +4 on HellaSwag compared to greedy. CD searches for strings that are more likely under a good model than a weak model, emphasizing the improvement from the better model. 
           Contrastive Decoding Improves Reasoning in Large Language Models paper page:  https://t.co/DVhdFSyHHv  demonstrate that Contrastive Decoding -- a simple, computationally light, and training-free text generation method proposed by Li et al 2022 -- achieves large out-of-the-box 
            
                
                3
              
              
                
                18
              
              
                
                114
              
             Special thanks to Mike Lewis (@ml_perception) and @MetaAI for a great research residency! (8/8) 
          
                
                0
              
              
                
                0
              
              
                
                7
              
             There’s plenty more research to be done: in many ways our formulation is naive, and on some tasks (especially truthfulness) contrastive decoding can harm performance. I’ll be looking into overcoming these shortfalls; excited to see where the research leads! (7/8) 
          
                
                1
              
              
                
                0
              
              
                
                5
              
             What’s the takeaway? We can improve performance across many different tasks just by “negative ensembling” a small model and a large model. We support a new contrastive paradigm, in which by default we use more than one model to encourage/discourage various behaviors. (6/8) 
          
                
                1
              
              
                
                0
              
              
                
                7
              
             Here’s a visual showing how the basic premise works, not including the masking: (4/8) 
          
                
                2
              
              
                
                0
              
              
                
                7
              
             The benefits aren’t negligible: self-consistency, another general reasoning method, takes 200-500% more compute to get the same gain. Plus, our method stacks on top of self-consistency to get even more of a boost. (3/8) 
          
                
                1
              
              
                
                1
              
              
                
                5
              
             The method comes from Li et al 2022 (  https://t.co/QaxCZtSrI9),  although we make some modifications for interpretability. We don’t engineer anything special for reasoning here, but still get gains almost across the board for math problems. (2/8) 
          
                
                1
              
              
                
                0
              
              
                
                5
              
             Excited to announce my new paper! Check it out:  https://t.co/8WedbLmaZO  TL;DR: we improve LM reasoning with only 3-5 lines of code and 3% extra compute. The method requires no training, scales well, and earlier work shows that humans prefer its longer generations. (1/8) 
           Contrastive Decoding Improves Reasoning in Large Language Models paper page:  https://t.co/DVhdFSyHHv  demonstrate that Contrastive Decoding -- a simple, computationally light, and training-free text generation method proposed by Li et al 2022 -- achieves large out-of-the-box 
            
                
                8
              
              
                
                31
              
              
                
                179
              
             
             
            