Tuo Zhao @tourzhao X Profile

Tuo Zhao

@tourzhao

Followers

2K

Following

578

Media

32

Statuses

375

Associate Professor at Georgia Tech, Ph.D. in Computer Science. Research Interests: Machine Learning

Atlanta, Georgia

Joined August 2019

Don't wanna be here? Send us removal request.

Tuo Zhao

@tourzhao

1 month

🚀 New release for the Phi family! . **SlimMOE** ( trims bulky Φ-3.5-MoE experts into agile models (4-6× smaller) with MINIMAL accuracy loss. If you ❤️ Phi-3 mini/small, you’ll love these lighter siblings.👇.

arxiv.org

The Mixture of Experts (MoE) architecture has emerged as a powerful paradigm for scaling large language models (LLMs) while maintaining inference efficiency. However, their enormous memory...

1

7

24

Tuo Zhao

@tourzhao

3 days

Funny how some folks love to claim LLMs can’t do basic math like comparing 9.2 and 9.11, while happily claiming using those same models as agents to solve real-world problems. Irony much? 😉.

0

6

Grok

@grok

21 hours

Generate videos in just a few seconds. Try Grok Imagine, free for a limited time.

441

195

2K

Tuo Zhao

@tourzhao

21 days

Should Google also start their celebration?.

Lin Yang

@lyang36

21 days

🚨 Olympiad math + AI:. We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇.#AI #Math #LLMs #IMO2025.

0

1

Tuo Zhao

@tourzhao

25 days

0

2

18

Tuo Zhao

@tourzhao

1 month

@zichong_li @chenliang1_ @Zixuan_Zzz @HongIlgee @WeizhuChen @mlatgt @GeorgiaTechISyE @GTCSE @Microsoft 🔧 With memory-efficient optimizers like 8-bit Adam or Muon, you can fine-tune phi-moe-mini-instruct on a single A100 and phi-moe-tiny-instruct on a single A6000. Perfect testbeds for MoE research when resources are tight! #PhiSeries #SlimMOE.

0

1

Tuo Zhao

@tourzhao

1 month

Joint work with @zichong_li @chenliang1_ @Zixuan_Zzz @HongIlgee Young Jin Kim and @WeizhuChen (@mlatgt @GeorgiaTechISyE @GTCSE @Microsoft).

1

0

Tuo Zhao

@tourzhao

1 month

🥇 Meet the smallest Phi MoE yet: **phi-moe-tiny-instruct** (3.8 B total / 1.1 B active). Same SlimMOE magic, instruction-tuned for real tasks, now light enough for laptops & mobile GPUs. Grab the weights 👉 .#TinyPhi #MoE #OpenSource.

huggingface.co

1

0

2

Tuo Zhao

@tourzhao

1 month

🚀 New to the Phi family: **phi-moe-mini-instruct** (7.6 B total / 2.4 B active)! .SlimMOE trims Phi-3.5-MoE 6× while preserving almost all accuracy—ideal for edge inference. Try it here 👉 .#SlimMOE #Phi #LLM.

huggingface.co

1

0

1

Tuo Zhao

@tourzhao

2 months

Is Claude 4 hiding the reasoning process also?.

0

Tuo Zhao

@tourzhao

3 months

Using Deep Research to draft my paper's related work section, but the results are a bit disappointing—over 50% of cited papers are fake due to excessive hallucination.

0

6

Tuo Zhao

@tourzhao

4 months

Seriously tho. what's up with ICML's 5000 character limit on rebuttals? Like why?? 🤔.

1

0

9

Tuo Zhao

@tourzhao

5 months

Sharing a concise review reminder template that's worked well:. Dear {{fullname}}, . Your ICML 2025 reviews are due today. Please submit them promptly via this link: {{submit_review_link}}. Your expertise is essential to our review process. Regards, .Area Chair, ICML 2025.

0

16

Tuo Zhao

@tourzhao

5 months

Joint work with @li_zichong Xinyu Feng, Yuheng Cai, @Zixuan_Zzz @chenliang1_ @WeizhuChen Tianyi Liu and Haoyu Wang.

0

Tuo Zhao

@tourzhao

5 months

GSA works by sampling diverse responses and synthesizing them into an improved answer. Unlike self-consistency, GSA does not require verifiable tokens for majority voting, making it applicable to open-ended tasks. Our experiments show notable gains across various tasks. (2/2).

1

0

Tuo Zhao

@tourzhao

5 months

Excited to share our new arXiv paper (: "LLMs Can Generate a Better Answer by Aggregating Their Own Responses"! We introduce Generative Self-Aggregation (GSA), a new prompting approach to enhance LLM performance. (1/2)

2

10

Tuo Zhao

@tourzhao

5 months

Joint work with @Liming_Liu6, @Zixuan_Zzz, @SimonShaoleiDu.

0

Tuo Zhao

@tourzhao

5 months

Specifically, we prove 1) the existence of progressive sharpening and self-stabilization under large learning rates, 2) sharpness upper bound for the entire GD trajectory, 3) the non-monotonic loss is essentially monotonic when projected to the relevant dimension. (3/3) #EoS.

1

0

Tuo Zhao

@tourzhao

5 months

🔍 We introduces a nontrivial two-layer linear network with 2D input, where one dimension is relevant to the response and the other is irrelevant. Such an input structure reveals new insights about the EoS phenomenon. (2/3) #MachineLearningTheory.

1

0

Tuo Zhao

@tourzhao

5 months

📢 Check our arXiv preprint: "A Minimalist Example of Edge-of-Stability and Progressive Sharpening" (. We prove progressive sharpening and self-stabilization of gradient descent under large learning rates for training linear networks. #DeepLearning (1/3)

1

4

17

Tuo Zhao

@tourzhao

5 months

Grateful for my awesome collabrators: @Liming_Liu6, @ZhenghaoXu0, @Zixuan_Zzz, @li_zichong, @GT_HaoKang, @chenliang1_, @WeizhuChen (3/3).

0

1