Haokun Liu @HaokunLiu5280 X Profile

Haokun Liu

@HaokunLiu5280

Followers

36

Following

17

Media

5

Statuses

46

Ph.D. student in Computer Science at the University of Chicago, working at the Chicago Human + AI Lab (CHAI) and advised by Professor Chenhao Tan

https://t.co/1YDnYYGEPO

Chicago

Joined April 2024

Don't wanna be here? Send us removal request.

Xiaoyan Bai

@Elenal3ai

4 days

❓ Does an LLM know thyself? 🪞 Humans pass the mirror test at ~18 months 👶 But what about LLMs? Can they recognize their own writing — or even admit authorship at all? In our new paper, we put 10 state-of-the-art models to the test. Read on 👇 1/n 🧵

2

17

44

Haokun Liu

@HaokunLiu5280

8 days

Replacing scientists with AI isn’t just unlikely, it’s a bad design goal. The better path is collaborative science. Let AI explore the ideas, draft hypotheses, surface evidence, and propose checks. Let humans decide what matters, set standards, and judge what counts as discovery.

Chenhao Tan

@ChenhaoTan

8 days

AI can accelerate scientific discovery, but only if we get the scientist–AI interaction right. The dream of “autonomous AI scientists” is tempting: machines that generate hypotheses, run experiments, and write papers. But science isn’t just an automation problem — it’s also a

0

4

Dang Nguyen

@divingwithorcas

1 month

HR Simulator™: a game where you gaslight, deflect, and “let’s circle back” your way to victory. Every email a boss fight, every “per my last message” a critical hit… or maybe you just overplayed your hand 🫠 Can you earn Enlightened Bureaucrat status? (link below)

2

11

35

Xiao Liu

@xxxxiaol

1 month

🚀 We’re thrilled to announce the upcoming AI & Scientific Discovery online seminar! We have an amazing lineup of speakers. This series will dive into how AI is accelerating research, enabling breakthroughs, and shaping the future of research across disciplines.

2

5

8

Haokun Liu

@HaokunLiu5280

2 months

@ChenhaoTan @Hoper_Tom @xxxxiaol @SRSchmidgall @ChengleiSi Our goal is to foster cross-disciplinary discussions on methods, evaluation, and applications that connect AI research with scientific practice.

0

2

Haokun Liu

@HaokunLiu5280

2 months

Co-organizers: Maria K. Chan, @ChenhaoTan, @Hoper_Tom, @xxxxiaol, @SRSchmidgall, @ChengleiSi Last year's edition:

ai-and-scientific-discovery.github.io

Workshop on AI and Scientific Discovery ---

1

3

Haokun Liu

@HaokunLiu5280

2 months

We are proposing the second workshop on AI & Scientific Discovery at EACL/ACL. The workshop will explore how AI can advance scientific discovery. Please use this Google form to indicate your interest: https://t.co/Buz64UIFgE More in the 🧵! Please share! #MLSky 🧠

docs.google.com

We are proposing the second workshop on AI & Scientific Discovery at EACL/ACL (Annual meetings of The Association for Computational Linguistics, the European Language Resource Association and...

1

6

12

Andrej Karpathy

@karpathy

4 months

+1 for "context engineering" over "prompt engineering". People associate prompts with short task descriptions you'd give an LLM in your day-to-day use. When in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window

tobi lutke

@tobi

4 months

I really like the term “context engineering” over prompt engineering. It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM.

530

2K

14K

Xiaoyan Bai

@Elenal3ai

5 months

🚨 New paper alert 🚨 Ever asked an LLM-as-Marilyn Monroe who the US president was in 2000? 🤔 Should the LLM answer at all? We call these clashes Concept Incongruence. Read on! ⬇️ 1/n 🧵

1

10

19

Rosa Zhou

@qiaoyu_rosa

6 months

Excited that our paper got accepted by ACL 2025 main conference! See you in Vienna! 🥳🥳🥳

Haokun Liu

@HaokunLiu5280

1 year

1/ 🚀 New Paper Alert! Excited to share: Literature Meets Data: A Synergistic Approach to Hypothesis Generation 📚📊! We propose a novel framework combining literature insights & observational data with LLMs for hypothesis generation. Here’s how and why it matters.

0

1

9

Haokun Liu

@HaokunLiu5280

6 months

13/ Lastly, great thanks to my wonderful collaborators @sicong_huang, Jingyu Hu, @qiaoyu_rosa, and my advisor @ChenhaoTan!

0

1

Haokun Liu

@HaokunLiu5280

6 months

12/ 🌟 For more details and to access our datasets and code, please visit our paper at https://t.co/0pKdYEMPEi, we also have an official website and leaderboards available at:

chicagohai.github.io

1

0

1

Haokun Liu

@HaokunLiu5280

6 months

11/ Why HypoBench matters: Establishes a structured way to advance AI's role in scientific discovery and everyday reasoning, highlighting both current capabilities and significant challenges.

1

0

1

Haokun Liu

@HaokunLiu5280

6 months

10/ Model priors matter: We see that the models have different priors, which lead to varying behaviors in different tasks—generating good hypotheses is harder when prior knowledge is not helpful.

1

0

1

Haokun Liu

@HaokunLiu5280

6 months

9/ And it gets worse in counterintuitive settings - the models perform significantly worse when the underlying hypotheses are counterintuitive.

1

0

1

Haokun Liu

@HaokunLiu5280

6 months

8/ 💡 Synthetic dataset results show: LLMs handle simple interactions well but struggle with increased noise, distractors, or subtleties in text—highlighting significant rooms for improvement.

1

0

1

Haokun Liu

@HaokunLiu5280

6 months

7/ Qualitative Insights: Methods balancing novelty and plausibility are rare; iterative refinement boosts novelty but risks plausibility. Literature-driven hypotheses excelled in plausibility but lacked novelty.

1

0

1

Haokun Liu

@HaokunLiu5280

6 months

6/ 🌍 Real-world implications: Methods integrating literature insights with data outperform simple zero/few-shot inference. Qwen excelled in generating generalizable hypotheses.

1

0

1

Haokun Liu

@HaokunLiu5280

6 months

5/ 🚨 But… Even top models and methods struggle significantly as task complexity rises. At base difficulty, the best model captured 93.8% of hypotheses; this dropped sharply to 38.8% with increased complexity.

1

0

1

Haokun Liu

@HaokunLiu5280

6 months

4/ Yes, LLMs can generate effective hypotheses: we tested 4 state-of-the-art models—GPT, Qwen, Llama and DeepSeek-R1—with 6 existing hypothesis generation methods. We found that using Qwen and integrating literature with data (LITERATURE + DATA) yields the best results.

1

0

1