Lizhou “Leo” Fan @LeegeoF X Profile

Lizhou “Leo” Fan

@LeegeoF

Followers

124

Following

131

Media

1

Statuses

31

Vice-Chancellor Assistant Professor @CUHKofficial | Previouly Postdoc @harvardmed, PhD @umsi, BS @uclastat | Medical AI, AI Agents, Trustworthy AI, Psychiatry

Joined April 2019

Don't wanna be here? Send us removal request.

Hongru Wang

@HongruWang007

6 months

What’s is the agent? What is the optimal behavior to achieve the predefined goal? And how to learn that behavior policy? We formally introduce a systematic Theory of Agent (ToA), analogous to the cognitive framework of Theory of Mind (ToM). Where ToM refers to the ability to

1

33

129

Jiahao Qiu

@JiahaoQiu99

7 months

The GAIA game is over, and Alita is the final answer. Alita takes the top spot in GAIA, outperforming OpenAI Deep Research and Manus. Many general-purpose agents rely heavily on large-scale, manually predefined tools and workflows. However, we believe that for general AI

17

31

97

Sumanth

@Sumanth_077

9 months

LLM Engineer Toolkit: A curated list of 120+ LLM libraries for training, fine-tuning, building, evaluating, deploying, RAG, and AI agents! 100% Open Source

44

501

3K

Wenyue Hua

@HuaWenyue31539

1 year

🌟🎲🎲How to create a rational LLM-based agent? using game-theoretic workflow! Game-theoretic LLM: Agent Workflow for Negotiation Games 😊 paper link: https://t.co/hJzChwHpjg github link: https://t.co/Xs8lUqMM2O 😼 This paper aims at observing and enhancing the performance of

5

51

203

Shan Chen🛬 NeurIPS 2025

@shan23chen

1 year

🚀 Exciting News for AI4Health! 🌐 We’re thrilled to release WorldMedQA-V, a multilingual, multimodal medical examination dataset designed to benchmark vision-language models in healthcare! 🩺💻 👉 Check it out: https://t.co/roHxMOa5dR 🧵👇 #AI #HealthcareAI

1

7

20

Tom Barry

@BomTarry

1 year

Really nice reference for work in the LLM/AI and health space by @YuHuizi and @LeegeoF et al. Relevant to what we've been planning @DrJoDaniels https://t.co/M4E9oy2R8i

link.springer.com

Journal of Healthcare Informatics Research - Large language models (LLMs) have rapidly become important tools in Biomedical and Health Informatics (BHI), potentially enabling new ways to analyze...

0

2

1

Aran Komatsuzaki

@arankomatsuzaki

1 year

Google presents On scalable oversight with weak LLMs judging strong LLMs https://t.co/8kKA3MpLom

7

66

405

Valerio Capraro

@ValerioCapraro

1 year

Why did language evolve among humans? To facilitate thinking? Or to facilitate communication? This new perspective article just published in @Nature suggests that language evolved as a tool for communication. The main evidence against the language-for-thought hypothesis comes

55

200

939

Philipp Schmid

@_philschmid

1 year

Is that what we call Bingo? 🎯 "Samba = Mamba + MLP + Sliding Window Attention + MLP stacking at the layer level." => infinite context length with linear complexity Samba-3.8B-instruct outperforms Phi-3-mini across all benchmarks using the same dataset (trained on 3.2

3

23

122

School of Information

@umsi

2 years

Congratulations, Dr. Fan! 🎉 UMSI’s Lizhou “Leo” Fan successfully defended his dissertation, “Generative AI-augmented and User-centric Research Data Discovery and Reuse.” He’ll soon join @BrighamWomens as a postdoctoral research fellow 👏 @LeegeoF #PhDone

1

2

School of Information

@umsi

2 years

New research from UMSI: A dataset for measuring the impact of research data and their curation Libby Hemphill, Andrea Thomer, Sara Lafia, Lizhou Fan, David Bleckley, Elizabeth Moss @libbyh @lafia_s @LeegeoF @ScientificData https://t.co/G6GSwKKIjM

1

0

Wenyue Hua

@HuaWenyue31539

2 years

😳 How much is VLM's reasoning ability lagging behind LLM's reasoning ability? 🔖 We construct NPHardEval4V 🔥, the visual counterpart of NPHardEval ▶️▶️▶️ After removing the effect of recognition failure and instruction-following failure, VLMs are much worse on reasoning

1

4

Lizhou “Leo” Fan

@LeegeoF

2 years

Thank you @huggingface and @clefourrier for featuring our NPHardEval benchmark and leaderboard. Started later Jan, we update the test data monthly and will release updated evaluation (possibly with more models) the next Month. Have your favorite LLM in mind? Let us know!

Clémentine Fourrier 🍊 is off till Dec 2026 hiking

@clefourrier

2 years

New leaderboard: NPHardEval! It uses logical questions of diff. complexities as a proxy for reasoning abilities 💪 Since the questions can be generated automatically, it's going to be dynamic, updated monthly! 🚀 Congrats to @HuaWenyue31539 @LeegeoF ! https://t.co/wGElnhILcn

0

1

4

Wenyue Hua

@HuaWenyue31539

2 years

NPHardEval benchmark updated version! 1. Phi-2 has a good performance (Huggingface leaderboard here: https://t.co/0J6KpVnGjE) 2. Robustness experiment: Our benchmark is actually robust under finetuning hack! See details in the paper https://t.co/5hpRoGHLd3

huggingface.co

0

4

7

Wenyue Hua

@HuaWenyue31539

2 years

Under NPHardEval benchmark, GPT-4 is still far beyong all other models. Open source models are going close after GPT-3.5 and Claude: Qwen, Yi, Mistral and Phi-2 (even it is only 2.7b! Textbook is indeed all we need haha) show good performance.

0

4

11

Yongfeng Zhang

@yongfengzhang9

2 years

What is LLM's ability in solving P, NP, NPC, and NP-Hard problems? Check out @LeegeoF @HuaWenyue31539 etc.'s paper on evaluating LLM's reasoning ability via complexity classes.

Wenyue Hua

@HuaWenyue31539

2 years

Can LLM solve NP-hard problem? We proposed a new benchmark to rigorously evaluate LLMs' reasoning ability: NPHardEval: 1. Built based computational complexity hierarchy 2. Automatic datapoint generation 3. Automatic result checking 4. Monthly refresh -- no overfitting!

0

1

7

Wenyue Hua

@HuaWenyue31539

2 years

Can LLM solve NP-hard problem? We proposed a new benchmark to rigorously evaluate LLMs' reasoning ability: NPHardEval: 1. Built based computational complexity hierarchy 2. Automatic datapoint generation 3. Automatic result checking 4. Monthly refresh -- no overfitting!

3

5

Sara Lafia

@lafia_s

3 years

My team (@libbyh @an_dre_a_ @LeegeoF) published our analysis of data reuse communities in the @ICPSR Bibliography: "Subdivisions and Crossroads: Identifying Hidden Community Structures in a Data Archive’s Citation Network" ( https://t.co/lVxKUNLpLD) @QSS_ISSI #datacitation

direct.mit.edu

Abstract. Data archives are an important source of high-quality data in many fields, making them ideal sites to study data reuse. By studying data reuse through citation networks, we are able to...

1

5

11

Lizhou “Leo” Fan

@LeegeoF

3 years

@ToddPresner @UCLA_DH And please feel free to give us any suggestions, as both articles are previews for now. I these both works can converge if we think from the perspective of the human knowledge network. Maybe that's gonna be one next research topic for me.

0

Lizhou “Leo” Fan

@LeegeoF

3 years

I was lucky to collaborate with Prof. Anne Gilliland, especially during the pandemic. Her advice helped me through many difficulties and her suggestions are always helpful for my archival science research. Although she's not on Twitter much, I want to shout out to her!

0