 
            
              Haokun Liu
            
            @HaokunLiu5280
Followers
                36
              Following
                17
              Media
                5
              Statuses
                46
              Ph.D. student in Computer Science at the University of Chicago, working at the Chicago Human + AI Lab (CHAI) and advised by Professor Chenhao Tan
              
              Chicago
            
            
              
              Joined April 2024
            
            
           ❓ Does an LLM know thyself? 🪞 Humans pass the mirror test at ~18 months 👶 But what about LLMs? Can they recognize their own writing — or even admit authorship at all? In our new paper, we put 10 state-of-the-art models to the test. Read on 👇 1/n 🧵 
          
                
                2
              
              
                
                17
              
              
                
                44
              
             Replacing scientists with AI isn’t just unlikely, it’s a bad design goal. The better path is collaborative science. Let AI explore the ideas, draft hypotheses, surface evidence, and propose checks. Let humans decide what matters, set standards, and judge what counts as discovery. 
           AI can accelerate scientific discovery, but only if we get the scientist–AI interaction right. The dream of “autonomous AI scientists” is tempting: machines that generate hypotheses, run experiments, and write papers. But science isn’t just an automation problem — it’s also a 
            
                
                0
              
              
                
                0
              
              
                
                4
              
             HR Simulator™: a game where you gaslight, deflect, and “let’s circle back” your way to victory. Every email a boss fight, every “per my last message” a critical hit… or maybe you just overplayed your hand 🫠 Can you earn Enlightened Bureaucrat status? (link below) 
          
                
                2
              
              
                
                11
              
              
                
                35
              
             🚀 We’re thrilled to announce the upcoming AI & Scientific Discovery online seminar! We have an amazing lineup of speakers. This series will dive into how AI is accelerating research, enabling breakthroughs, and shaping the future of research across disciplines. 
          
                
                2
              
              
                
                5
              
              
                
                8
              
            
            @ChenhaoTan @Hoper_Tom @xxxxiaol @SRSchmidgall @ChengleiSi Our goal is to foster cross-disciplinary discussions on methods, evaluation, and applications that connect AI research with scientific practice.
          
          
                
                0
              
              
                
                0
              
              
                
                2
              
             Co-organizers: Maria K. Chan, @ChenhaoTan, @Hoper_Tom, @xxxxiaol, @SRSchmidgall, @ChengleiSi Last year's edition: 
          
            
            ai-and-scientific-discovery.github.io
              Workshop on AI and Scientific Discovery ---
            
                
                1
              
              
                
                1
              
              
                
                3
              
             We are proposing the second workshop on AI & Scientific Discovery at EACL/ACL. The workshop will explore how AI can advance scientific discovery. Please use this Google form to indicate your interest:  https://t.co/Buz64UIFgE  More in the 🧵! Please share! #MLSky 🧠 
          
            
            docs.google.com
              We are proposing the second workshop on AI & Scientific Discovery at EACL/ACL (Annual meetings of The Association for Computational Linguistics, the European Language Resource Association and...
            
                
                1
              
              
                
                6
              
              
                
                12
              
             +1 for "context engineering" over "prompt engineering". People associate prompts with short task descriptions you'd give an LLM in your day-to-day use. When in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window 
           I really like the term “context engineering” over prompt engineering. It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM. 
          
                
                530
              
              
                
                2K
              
              
                
                14K
              
             🚨 New paper alert 🚨 Ever asked an LLM-as-Marilyn Monroe who the US president was in 2000? 🤔 Should the LLM answer at all? We call these clashes Concept Incongruence. Read on! ⬇️ 1/n 🧵 
          
                
                1
              
              
                
                10
              
              
                
                19
              
             Excited that our paper got accepted by ACL 2025 main conference! See you in Vienna! 🥳🥳🥳 
           1/ 🚀 New Paper Alert! Excited to share: Literature Meets Data: A Synergistic Approach to Hypothesis Generation 📚📊! We propose a novel framework combining literature insights & observational data with LLMs for hypothesis generation. Here’s how and why it matters. 
          
                
                0
              
              
                
                1
              
              
                
                9
              
             13/ Lastly, great thanks to my wonderful collaborators @sicong_huang, Jingyu Hu, @qiaoyu_rosa, and my advisor @ChenhaoTan! 
          
                
                0
              
              
                
                0
              
              
                
                1
              
             12/ 🌟 For more details and to access our datasets and code, please visit our paper at  https://t.co/0pKdYEMPEi,  we also have an official website and leaderboards available at: 
          
            
            chicagohai.github.io
            
                
                1
              
              
                
                0
              
              
                
                1
              
             11/ Why HypoBench matters: Establishes a structured way to advance AI's role in scientific discovery and everyday reasoning, highlighting both current capabilities and significant challenges. 
          
                
                1
              
              
                
                0
              
              
                
                1
              
             10/ Model priors matter: We see that the models have different priors, which lead to varying behaviors in different tasks—generating good hypotheses is harder when prior knowledge is not helpful. 
          
                
                1
              
              
                
                0
              
              
                
                1
              
             9/ And it gets worse in counterintuitive settings - the models perform significantly worse when the underlying hypotheses are counterintuitive. 
          
                
                1
              
              
                
                0
              
              
                
                1
              
             8/ 💡 Synthetic dataset results show: LLMs handle simple interactions well but struggle with increased noise, distractors, or subtleties in text—highlighting significant rooms for improvement. 
          
                
                1
              
              
                
                0
              
              
                
                1
              
             7/ Qualitative Insights: Methods balancing novelty and plausibility are rare; iterative refinement boosts novelty but risks plausibility. Literature-driven hypotheses excelled in plausibility but lacked novelty. 
          
                
                1
              
              
                
                0
              
              
                
                1
              
             6/ 🌍 Real-world implications: Methods integrating literature insights with data outperform simple zero/few-shot inference. Qwen excelled in generating generalizable hypotheses. 
          
                
                1
              
              
                
                0
              
              
                
                1
              
             5/ 🚨 But… Even top models and methods struggle significantly as task complexity rises. At base difficulty, the best model captured 93.8% of hypotheses; this dropped sharply to 38.8% with increased complexity. 
          
                
                1
              
              
                
                0
              
              
                
                1
              
             4/ Yes, LLMs can generate effective hypotheses: we tested 4 state-of-the-art models—GPT, Qwen, Llama and DeepSeek-R1—with 6 existing hypothesis generation methods. We found that using Qwen and integrating literature with data (LITERATURE + DATA) yields the best results. 
          
                
                1
              
              
                
                0
              
              
                
                1
              
             
             
               
             
             
             
              