Lizhou “Leo” Fan
@LeegeoF
Followers
124
Following
131
Media
1
Statuses
31
Vice-Chancellor Assistant Professor @CUHKofficial | Previouly Postdoc @harvardmed, PhD @umsi, BS @uclastat | Medical AI, AI Agents, Trustworthy AI, Psychiatry
Joined April 2019
What’s is the agent? What is the optimal behavior to achieve the predefined goal? And how to learn that behavior policy? We formally introduce a systematic Theory of Agent (ToA), analogous to the cognitive framework of Theory of Mind (ToM). Where ToM refers to the ability to
1
33
129
The GAIA game is over, and Alita is the final answer. Alita takes the top spot in GAIA, outperforming OpenAI Deep Research and Manus. Many general-purpose agents rely heavily on large-scale, manually predefined tools and workflows. However, we believe that for general AI
17
31
97
LLM Engineer Toolkit: A curated list of 120+ LLM libraries for training, fine-tuning, building, evaluating, deploying, RAG, and AI agents! 100% Open Source
44
501
3K
🌟🎲🎲How to create a rational LLM-based agent? using game-theoretic workflow! Game-theoretic LLM: Agent Workflow for Negotiation Games 😊 paper link: https://t.co/hJzChwHpjg github link: https://t.co/Xs8lUqMM2O 😼 This paper aims at observing and enhancing the performance of
5
51
203
🚀 Exciting News for AI4Health! 🌐 We’re thrilled to release WorldMedQA-V, a multilingual, multimodal medical examination dataset designed to benchmark vision-language models in healthcare! 🩺💻 👉 Check it out: https://t.co/roHxMOa5dR 🧵👇 #AI #HealthcareAI
1
7
20
Really nice reference for work in the LLM/AI and health space by @YuHuizi and @LeegeoF et al. Relevant to what we've been planning @DrJoDaniels
https://t.co/M4E9oy2R8i
link.springer.com
Journal of Healthcare Informatics Research - Large language models (LLMs) have rapidly become important tools in Biomedical and Health Informatics (BHI), potentially enabling new ways to analyze...
0
2
1
Google presents On scalable oversight with weak LLMs judging strong LLMs https://t.co/8kKA3MpLom
7
66
405
Why did language evolve among humans? To facilitate thinking? Or to facilitate communication? This new perspective article just published in @Nature suggests that language evolved as a tool for communication. The main evidence against the language-for-thought hypothesis comes
55
200
939
Is that what we call Bingo? 🎯 "Samba = Mamba + MLP + Sliding Window Attention + MLP stacking at the layer level." => infinite context length with linear complexity Samba-3.8B-instruct outperforms Phi-3-mini across all benchmarks using the same dataset (trained on 3.2
3
23
122
Congratulations, Dr. Fan! 🎉 UMSI’s Lizhou “Leo” Fan successfully defended his dissertation, “Generative AI-augmented and User-centric Research Data Discovery and Reuse.” He’ll soon join @BrighamWomens as a postdoctoral research fellow 👏 @LeegeoF #PhDone
1
1
2
New research from UMSI: A dataset for measuring the impact of research data and their curation Libby Hemphill, Andrea Thomer, Sara Lafia, Lizhou Fan, David Bleckley, Elizabeth Moss @libbyh @lafia_s @LeegeoF @ScientificData
https://t.co/G6GSwKKIjM
1
1
0
😳 How much is VLM's reasoning ability lagging behind LLM's reasoning ability? 🔖 We construct NPHardEval4V 🔥, the visual counterpart of NPHardEval ▶️▶️▶️ After removing the effect of recognition failure and instruction-following failure, VLMs are much worse on reasoning
1
4
4
Thank you @huggingface and @clefourrier for featuring our NPHardEval benchmark and leaderboard. Started later Jan, we update the test data monthly and will release updated evaluation (possibly with more models) the next Month. Have your favorite LLM in mind? Let us know!
New leaderboard: NPHardEval! It uses logical questions of diff. complexities as a proxy for reasoning abilities 💪 Since the questions can be generated automatically, it's going to be dynamic, updated monthly! 🚀 Congrats to @HuaWenyue31539 @LeegeoF ! https://t.co/wGElnhILcn
0
1
4
NPHardEval benchmark updated version! 1. Phi-2 has a good performance (Huggingface leaderboard here: https://t.co/0J6KpVnGjE) 2. Robustness experiment: Our benchmark is actually robust under finetuning hack! See details in the paper https://t.co/5hpRoGHLd3
huggingface.co
0
4
7
Under NPHardEval benchmark, GPT-4 is still far beyong all other models. Open source models are going close after GPT-3.5 and Claude: Qwen, Yi, Mistral and Phi-2 (even it is only 2.7b! Textbook is indeed all we need haha) show good performance.
0
4
11
What is LLM's ability in solving P, NP, NPC, and NP-Hard problems? Check out @LeegeoF @HuaWenyue31539 etc.'s paper on evaluating LLM's reasoning ability via complexity classes.
Can LLM solve NP-hard problem? We proposed a new benchmark to rigorously evaluate LLMs' reasoning ability: NPHardEval: 1. Built based computational complexity hierarchy 2. Automatic datapoint generation 3. Automatic result checking 4. Monthly refresh -- no overfitting!
0
1
7
Can LLM solve NP-hard problem? We proposed a new benchmark to rigorously evaluate LLMs' reasoning ability: NPHardEval: 1. Built based computational complexity hierarchy 2. Automatic datapoint generation 3. Automatic result checking 4. Monthly refresh -- no overfitting!
3
5
5
My team (@libbyh @an_dre_a_ @LeegeoF) published our analysis of data reuse communities in the @ICPSR Bibliography: "Subdivisions and Crossroads: Identifying Hidden Community Structures in a Data Archive’s Citation Network" ( https://t.co/lVxKUNLpLD)
@QSS_ISSI #datacitation
direct.mit.edu
Abstract. Data archives are an important source of high-quality data in many fields, making them ideal sites to study data reuse. By studying data reuse through citation networks, we are able to...
1
5
11
@ToddPresner @UCLA_DH And please feel free to give us any suggestions, as both articles are previews for now. I these both works can converge if we think from the perspective of the human knowledge network. Maybe that's gonna be one next research topic for me.
0
0
0
I was lucky to collaborate with Prof. Anne Gilliland, especially during the pandemic. Her advice helped me through many difficulties and her suggestions are always helpful for my archival science research. Although she's not on Twitter much, I want to shout out to her!
0
0
0