Haokun Liu Profile
Haokun Liu

@HaokunLiu5280

Followers
36
Following
17
Media
5
Statuses
46

Ph.D. student in Computer Science at the University of Chicago, working at the Chicago Human + AI Lab (CHAI) and advised by Professor Chenhao Tan

Chicago
Joined April 2024
Don't wanna be here? Send us removal request.
@Elenal3ai
Xiaoyan Bai
4 days
❓ Does an LLM know thyself? 🪞 Humans pass the mirror test at ~18 months 👶 But what about LLMs? Can they recognize their own writing — or even admit authorship at all? In our new paper, we put 10 state-of-the-art models to the test. Read on 👇 1/n 🧵
2
17
44
@HaokunLiu5280
Haokun Liu
8 days
Replacing scientists with AI isn’t just unlikely, it’s a bad design goal. The better path is collaborative science. Let AI explore the ideas, draft hypotheses, surface evidence, and propose checks. Let humans decide what matters, set standards, and judge what counts as discovery.
@ChenhaoTan
Chenhao Tan
8 days
AI can accelerate scientific discovery, but only if we get the scientist–AI interaction right. The dream of “autonomous AI scientists” is tempting: machines that generate hypotheses, run experiments, and write papers. But science isn’t just an automation problem — it’s also a
0
0
4
@divingwithorcas
Dang Nguyen
1 month
HR Simulator™: a game where you gaslight, deflect, and “let’s circle back” your way to victory. Every email a boss fight, every “per my last message” a critical hit… or maybe you just overplayed your hand 🫠 Can you earn Enlightened Bureaucrat status? (link below)
2
11
35
@xxxxiaol
Xiao Liu
1 month
🚀 We’re thrilled to announce the upcoming AI & Scientific Discovery online seminar! We have an amazing lineup of speakers. This series will dive into how AI is accelerating research, enabling breakthroughs, and shaping the future of research across disciplines.
2
5
8
@HaokunLiu5280
Haokun Liu
2 months
@ChenhaoTan @Hoper_Tom @xxxxiaol @SRSchmidgall @ChengleiSi Our goal is to foster cross-disciplinary discussions on methods, evaluation, and applications that connect AI research with scientific practice.
0
0
2
@HaokunLiu5280
Haokun Liu
2 months
We are proposing the second workshop on AI & Scientific Discovery at EACL/ACL. The workshop will explore how AI can advance scientific discovery. Please use this Google form to indicate your interest: https://t.co/Buz64UIFgE More in the 🧵! Please share! #MLSky 🧠
Tweet card summary image
docs.google.com
We are proposing the second workshop on AI & Scientific Discovery at EACL/ACL (Annual meetings of The Association for Computational Linguistics, the European Language Resource Association and...
1
6
12
@karpathy
Andrej Karpathy
4 months
+1 for "context engineering" over "prompt engineering". People associate prompts with short task descriptions you'd give an LLM in your day-to-day use. When in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window
@tobi
tobi lutke
4 months
I really like the term “context engineering” over prompt engineering. It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM.
530
2K
14K
@Elenal3ai
Xiaoyan Bai
5 months
🚨 New paper alert 🚨 Ever asked an LLM-as-Marilyn Monroe who the US president was in 2000? 🤔 Should the LLM answer at all? We call these clashes Concept Incongruence. Read on! ⬇️ 1/n 🧵
1
10
19
@qiaoyu_rosa
Rosa Zhou
6 months
Excited that our paper got accepted by ACL 2025 main conference! See you in Vienna! 🥳🥳🥳
@HaokunLiu5280
Haokun Liu
1 year
1/ 🚀 New Paper Alert! Excited to share: Literature Meets Data: A Synergistic Approach to Hypothesis Generation 📚📊! We propose a novel framework combining literature insights & observational data with LLMs for hypothesis generation. Here’s how and why it matters.
0
1
9
@HaokunLiu5280
Haokun Liu
6 months
13/ Lastly, great thanks to my wonderful collaborators @sicong_huang, Jingyu Hu, @qiaoyu_rosa, and my advisor @ChenhaoTan!
0
0
1
@HaokunLiu5280
Haokun Liu
6 months
12/ 🌟 For more details and to access our datasets and code, please visit our paper at https://t.co/0pKdYEMPEi, we also have an official website and leaderboards available at:
Tweet card summary image
chicagohai.github.io
1
0
1
@HaokunLiu5280
Haokun Liu
6 months
11/ Why HypoBench matters: Establishes a structured way to advance AI's role in scientific discovery and everyday reasoning, highlighting both current capabilities and significant challenges.
1
0
1
@HaokunLiu5280
Haokun Liu
6 months
10/ Model priors matter: We see that the models have different priors, which lead to varying behaviors in different tasks—generating good hypotheses is harder when prior knowledge is not helpful.
1
0
1
@HaokunLiu5280
Haokun Liu
6 months
9/ And it gets worse in counterintuitive settings - the models perform significantly worse when the underlying hypotheses are counterintuitive.
1
0
1
@HaokunLiu5280
Haokun Liu
6 months
8/ 💡 Synthetic dataset results show: LLMs handle simple interactions well but struggle with increased noise, distractors, or subtleties in text—highlighting significant rooms for improvement.
1
0
1
@HaokunLiu5280
Haokun Liu
6 months
7/ Qualitative Insights: Methods balancing novelty and plausibility are rare; iterative refinement boosts novelty but risks plausibility. Literature-driven hypotheses excelled in plausibility but lacked novelty.
1
0
1
@HaokunLiu5280
Haokun Liu
6 months
6/ 🌍 Real-world implications: Methods integrating literature insights with data outperform simple zero/few-shot inference. Qwen excelled in generating generalizable hypotheses.
1
0
1
@HaokunLiu5280
Haokun Liu
6 months
5/ 🚨 But… Even top models and methods struggle significantly as task complexity rises. At base difficulty, the best model captured 93.8% of hypotheses; this dropped sharply to 38.8% with increased complexity.
1
0
1
@HaokunLiu5280
Haokun Liu
6 months
4/ Yes, LLMs can generate effective hypotheses: we tested 4 state-of-the-art models—GPT, Qwen, Llama and DeepSeek-R1—with 6 existing hypothesis generation methods. We found that using Qwen and integrating literature with data (LITERATURE + DATA) yields the best results.
1
0
1