tourzhao Profile Banner
Tuo Zhao Profile
Tuo Zhao

@tourzhao

Followers
2K
Following
578
Media
32
Statuses
375

Associate Professor at Georgia Tech, Ph.D. in Computer Science. Research Interests: Machine Learning

Atlanta, Georgia
Joined August 2019
Don't wanna be here? Send us removal request.
@tourzhao
Tuo Zhao
1 month
🚀 New release for the Phi family! . **SlimMOE** ( trims bulky Φ-3.5-MoE experts into agile models (4-6× smaller) with MINIMAL accuracy loss. If you ❤️ Phi-3 mini/small, you’ll love these lighter siblings.👇.
Tweet card summary image
arxiv.org
The Mixture of Experts (MoE) architecture has emerged as a powerful paradigm for scaling large language models (LLMs) while maintaining inference efficiency. However, their enormous memory...
1
7
24
@tourzhao
Tuo Zhao
3 days
Funny how some folks love to claim LLMs can’t do basic math like comparing 9.2 and 9.11, while happily claiming using those same models as agents to solve real-world problems. Irony much? 😉.
0
0
6
@grok
Grok
21 hours
Generate videos in just a few seconds. Try Grok Imagine, free for a limited time.
441
195
2K
@tourzhao
Tuo Zhao
21 days
Should Google also start their celebration?.
@lyang36
Lin Yang
21 days
🚨 Olympiad math + AI:. We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇.#AI #Math #LLMs #IMO2025.
0
0
1
@tourzhao
Tuo Zhao
25 days
Tweet media one
0
2
18
@tourzhao
Tuo Zhao
1 month
@zichong_li @chenliang1_ @Zixuan_Zzz @HongIlgee @WeizhuChen @mlatgt @GeorgiaTechISyE @GTCSE @Microsoft 🔧 With memory-efficient optimizers like 8-bit Adam or Muon, you can fine-tune phi-moe-mini-instruct on a single A100 and phi-moe-tiny-instruct on a single A6000. Perfect testbeds for MoE research when resources are tight! #PhiSeries #SlimMOE.
0
0
1
@tourzhao
Tuo Zhao
1 month
1
0
0
@tourzhao
Tuo Zhao
1 month
🥇 Meet the smallest Phi MoE yet: **phi-moe-tiny-instruct** (3.8 B total / 1.1 B active). Same SlimMOE magic, instruction-tuned for real tasks, now light enough for laptops & mobile GPUs. Grab the weights 👉 .#TinyPhi #MoE #OpenSource.
Tweet card summary image
huggingface.co
1
0
2
@tourzhao
Tuo Zhao
1 month
🚀 New to the Phi family: **phi-moe-mini-instruct** (7.6 B total / 2.4 B active)! .SlimMOE trims Phi-3.5-MoE 6× while preserving almost all accuracy—ideal for edge inference. Try it here 👉 .#SlimMOE #Phi #LLM.
Tweet card summary image
huggingface.co
1
0
1
@tourzhao
Tuo Zhao
2 months
Is Claude 4 hiding the reasoning process also?.
0
0
0
@tourzhao
Tuo Zhao
3 months
Using Deep Research to draft my paper's related work section, but the results are a bit disappointing—over 50% of cited papers are fake due to excessive hallucination.
0
0
6
@tourzhao
Tuo Zhao
4 months
Seriously tho. what's up with ICML's 5000 character limit on rebuttals? Like why?? 🤔.
1
0
9
@tourzhao
Tuo Zhao
5 months
Sharing a concise review reminder template that's worked well:. Dear {{fullname}}, . Your ICML 2025 reviews are due today. Please submit them promptly via this link: {{submit_review_link}}. Your expertise is essential to our review process. Regards, .Area Chair, ICML 2025.
0
0
16
@tourzhao
Tuo Zhao
5 months
Joint work with @li_zichong Xinyu Feng, Yuheng Cai, @Zixuan_Zzz @chenliang1_ @WeizhuChen Tianyi Liu and Haoyu Wang.
0
0
0
@tourzhao
Tuo Zhao
5 months
GSA works by sampling diverse responses and synthesizing them into an improved answer. Unlike self-consistency, GSA does not require verifiable tokens for majority voting, making it applicable to open-ended tasks. Our experiments show notable gains across various tasks. (2/2).
1
0
0
@tourzhao
Tuo Zhao
5 months
Excited to share our new arXiv paper (: "LLMs Can Generate a Better Answer by Aggregating Their Own Responses"! We introduce Generative Self-Aggregation (GSA), a new prompting approach to enhance LLM performance. (1/2)
Tweet media one
2
2
10
@tourzhao
Tuo Zhao
5 months
0
0
0
@tourzhao
Tuo Zhao
5 months
Specifically, we prove 1) the existence of progressive sharpening and self-stabilization under large learning rates, 2) sharpness upper bound for the entire GD trajectory, 3) the non-monotonic loss is essentially monotonic when projected to the relevant dimension. (3/3) #EoS.
1
0
0
@tourzhao
Tuo Zhao
5 months
🔍 We introduces a nontrivial two-layer linear network with 2D input, where one dimension is relevant to the response and the other is irrelevant. Such an input structure reveals new insights about the EoS phenomenon. (2/3) #MachineLearningTheory.
1
0
0
@tourzhao
Tuo Zhao
5 months
📢 Check our arXiv preprint: "A Minimalist Example of Edge-of-Stability and Progressive Sharpening" (. We prove progressive sharpening and self-stabilization of gradient descent under large learning rates for training linear networks. #DeepLearning (1/3)
Tweet media one
1
4
17
@tourzhao
Tuo Zhao
5 months
Grateful for my awesome collabrators: @Liming_Liu6, @ZhenghaoXu0, @Zixuan_Zzz, @li_zichong, @GT_HaoKang, @chenliang1_, @WeizhuChen (3/3).
0
0
1